DTIC  FILE  CORY  AD-A142  45/ 


DOT/FAA/CT-83/1 5 


The  Measurement  of  Pilot 
Performance:  A  Master- 
Journeyman  Approach 


Earl  S.  Stein 


May  1 984 
Final  Report 


This  document  is  available  to  the  U.S.  public 
through  the  National  Technical  Information 
Service,  Springfield,  Virginia  22161. 


o 

US  Deportment  of  transportation 

federal  awkiWoo  AdnifctiUiotlon 

Technical  Center 

Atlantic  City  Airport,  N.J.  0S405 


84  06  26  019 


NOTICE 


This  docuoenC  is  d Isseainaced  under  the  sponsorship  of 
Che  Department  of  Transportation  in  the  interest  of 
information  exchange.  The  United  States  Government 
assumes  no  liability  for  the  contents  or  use  thereof. 

The  United  States  Government  does  not  endorse  products 
or  manufacturers.  Trade  or  manufacturer's  nsmes  appear 
herein  solely  because  they  are  considered  essential  to 
the  object  of  this  report. 


IP*"  iir''^  Vi*  '•' 


1.  R^iport  No. 


CT-83/15 


4.  Tiilo  ond  Subtitle 


THE  MEASUREMENT  OF  PILOT  PERFORMANCE; 
A  MASTER-JOURNEYMAN  APPROACH 


7.  Author's) 

Earl  S.  Stein 


9,  Porforming  Orgorti lotion  Homo  and  Addroti 

Federal  Aviation  Administration 
Technical  Center 

Atlantic  City  Airport,  New  Jersey  08405 

12.  Sponiering  Agopcy  Nomo  and  Addroti 

U.S.  Department  of  Transportation 
Federal  Aviation  Administration 
Technical  Center 

Atlantic  City  Airport.  New  Jersey  08405 _ 

15.  Supplementary  Notai 


\6.  Abstract 


Technical  Koport  Documantotion  Pop* 


3.  Recipient**  Cetelcf  He. 


.5.  Report  Date 

May  1984 


£.  Perlorming  Orgeniioiien  Code 


8.  Performing  Orgertitotion  Report  No. 

DOT/FAA/CT-83/15 


10.  Work  Unit  No.  (TRAIS) 

11.  Contract  or  Grant  No. 

161-301-150 _ 

13,  Type  of  Report  ertd  Period  Covered 

Final  Report 

14.  Spontoring  Agency  Cede 


This  project  evaluated  several  methods  for  measuring  pilot  performance  in  a  general 
aviation  simulator  and  examined  the  relationship  betw«“en  performance  and  workload. 
An  Automated  Performance  Measurement  (APM)  System  was  designed  for  use  in  a  flight 
simulator  which  was  instrumented  for  digital  data  collection.  Performance  rating 
was  accomplished  by  three  independent  observers.  Workload  was  assessed  using  a 
real-time  subjective  input  system  with  which  pilots  provided  workload  estimates 
every  minute. 

Two  groups  of  pilots  participated  in  the  experiment:  ten  professional  high-time  pilots 
and  tan  recently  qualified  instrument  pilots.  Both  the  APM  and  the  observer  ratings 
showed  significant  performance  differences  between  the  two  pilot  groups.  The 
automated  technique  s-.owed  more  of  a  spread,  however,  among  individuals  in  the 
professional  (masters)  group.  The  newly  qualified  pilots  (journeymen)  reported 
significantly  higher  workload  than  their  masters  counterparts  and  their  performance 
was  significantly  worjeo 

’X 


II.  Oit>Tlbv*ia«v  Staiomont 


17.  Kay  Word*  18.  Oit>rtbv»ia« 

Task  Difficulty  Task  Load 

Pilot  Performance  Document 

Pilot  Workload  Human  Workload  through  t 

Human  Performance  Service, 

Automated  Performance  Measurement  (APM) _ 

19.  Socu'ity  Cloolil.  (ol  ikis  rogoft)  |  X.  Soturify  Cloonil,  (of  Al »  oofo) 


Human  Workload 


Document  is  available  to  the  U.S.  public 
through  the  National  Technical  Information 
Service,  Springfield,  Virginia  22161 


2).  No.  o<  Pofoi  I  22,  P.ioo 


_ Unclassified 

Form  DOT  F  1700./ 


I _ Unclassified _ 

Ropraductlon  •!  compUtod  pogo  owthoriiod 


TABLE  OF  CONTENTS 


\ 


Page 


EXECUTIVE  SUMMARY  vii 

INTRODUCTION  1 

The  Problem  1 

Reasons  for  Performance  Measurement  1 

What  is  Performance  Measurement?  2 

Behavior  Classification/Taxonomy  4 

Performance  Rating  5 

Automated  Performance  Measurement  7 

Pilot  Workload  9 

Research  Goal  10 

METHOD  10 

Research  Design  10 

Participants  11 

Equipment  11 

Procedure  12 

RESULTS  16 

Quail ficatlons,  Objectives  and  Strategy  16 

Results  Summary  17 

DISCUSSION  55 

CONCLUSIONS  61 

REFERENCES  62 


APPENDICES 

A 
B 
C 
D 
E 
F 
G 
H 
I 
J 
K 
L 


Lesson  Plans 

Training  Briefing  and  Training  Ptugraia 
List  of  GAT  Variables 
Flight  Performance  Evaluation 
Participant  Briefing 
Workload  Scale  Instructions 
Test  Flight  Briefing 
Flight  Geometry 
Air  Traffic  Control  Script 
Flight  Workload  Questionnaire 
Interrater  Reliability  Correlations  —  Masters 
Interrater  Reliability  Correlations  —  Journeymen 


f~£  IT..  j 


.HTIS  GRAScI 
)TIC  TAB 
JnannouncGd 
J’lstif  IcatiorL, 


J 

□ 


By - 

Distribution/ 
Availability  Codes 
jAvail  ai'd/or 
Ei>cclal 


J 


ill 


LIST  OF  ILLUSTRATIONS 


Figure  Page 

1  Sample  Flight  Track  Plot  15 

2  Histogram  of  the  Pilot  Performance  Index  Canonical  Variable  27 

3  Histogram  of  the  Performance  Rating  Canonical  Variable  35 

4  Scatterplot  of  Workload  Variables  —  Master  and  42 

Journeyman  Pilots 

5  Scatterplot  of  Workload  Variables  —  Master  Pilots  43 

6  Scatterplot  of  Workload  Variables  —  Journeyman  Pilots  44 

7  Scatterplot  and  Regression,  Automated  Performance  46 

Measurement  Ratings  —  Master  Pilots 

8  Scatterplot  and  Regression,  Automated  Performance  47 

Measurement  Ratings  —  Journeyman  Pilots 

9  Scatterplot  and  Regression,  Automated  Perfoimance  48 

Measurement  Ratings  —  All  Pilots 

10  Scatterplot  and  Regieaslon,  Inflight  Workload  and  49 

Automated  Performance  Measurement  —  Master  Pilots 

11  Scatterplot  and  Regression,  Inflight  Workload  and  50 

Automated  Performance  Measurement  —  Journeyman  Pilots 

12  Scatterplot  and  Regression,  Inflight  Workload  and  51 

Automated  Performance  Measurement  —  All  Pilots 

13  Scatterplot  and  Regression,  Postflight  Workload  and  52 

Automated  Performance  Measurement  —  Master  Pilots 

14  Scatterplot  and  Regression,  Postflight  Workload  and  55 

Automated  Performance  Measurement  —  Journeyman  Pilots 

15  Scatterplot  and  Regression,  Poscfllght  Workload  and  54 

Automated  Performance  Measurement  —  All  Pilots 

16  Scatterplot  and  Regression,  Postflight  Workload  Factor  56 

and  Performance  Rating  Totals  —  Master  Pilots 

17  Scatterplot  and  Regression,  Postflight  Workload  Factor  57 

and  Performance  Rating  Totals  —  Journeyman  Pilots 

18  Scatterplot  and  Regression,  Postflight  Workload  Factor  58 

and  Performance  Rating  Totals  —  All  Pilots 


Iv 


LIST  OF  TABLES 


Tabla  Page 

1  List  of  Variables  Within  Each  Flight  Segment  14 

2  Flight  Variable  Screening  Using  Analysis  of  Variance  19 

3  Pilot  Performance  Index  Variable  List  20 

4  Analysis  of  Variance  on  PPl  Segment  Scores  —  All  PPI  22 

Variables  Included 

5  Analysis  of  Variance  on  PPI  Segment  Scores  after  Deletion  22 

of  Selected  Variables 

6  Mean  Automated  Performance  Scores  Using  PPI  23 

7  Automated  Performance  Scores,  PPI  Analysis  of  Variance  23 

8  Newman-Keuls  Analysis  of  PPI  Segments  Effects  24 

9  Multilinear  Regression  on  PPI  Scores  25 

10  Stepwise  Regression  on  PPI  Scores  (Flights  Pooled)  27 

11  Interrater  Reliability  Correlations  28 

12  Interrater  Reliability  Employing  Segment  Means  for  29 

Each  Rater  as  Data  Points  for  Correlations 

13  Analysis  of  Variance  on  Flight  Segment  Performance  Ratings  30 

14  Mean  Performance  Racings  30 

15  Performance  Rating  Analysis  of  Variance  Summary  31 

16  Performance  Ratings  Neuman-Keuls  Analysis  for  Flight  32 

Segments  Effects 

17  Multilinear  Regression  Data  on  Performance  Ratings  33 

18  Stepwise  Regression  on  Performance  Ratings  (Flights  Pooled)  35 

19  Mean  Inflight  Workload  Responses  36 

20  Inflight  Workload  Analysis  of  Variance  Summary  37 


V 


LIST  OF  TABLES  (Continued) 


Table  Page 

21  Neuman-Keula  Analysis  on  Workload  Segments  Main  Effect  38 

(Inflight) 

22  Mean  Delay  (Seconds)  Data  Summary  38 

23  Inflight  Response  Delay  Analysis  of  Variance  Summary  39 

2A  Postflight  Questionnaire  Results  AO 

25  Factor  Loadings  of  Postflight  Questionnaire  A1 


vi 


EXECUTIVE  SUMMARY 


Problem;  Modem  aviation  has  produced  highly  complex  person-machine  systems. 
The  evaluation  of  operator  performance,  particularly  that  of  pilots,  has  been  a 
serious  problem  which  has  made  system  development  more  difficult.  In  the  early 
days  of  aviation,  instructor  pilot  opinion  was  all  that  was  required.  As  systems 
became  more  complex  and  as  research  questions  became  increasingly  sophisticated, 
more  measurement  precision  was  required. 

Today,  performance  measures  run  the  gamut  from  refined  methods  of  obtaining 
observer  opinion  through  Automated  Performance  Measurement  (APM) ,  which  employs 
computers  to  compare  what  pilots  are  doing  against  precise  standards.  This 
current  pro}ect  examined  several  methods  of  measuring  pilot  performance  and 
evaluated  the  results  against  measures  of  pilot  workload.  The  primary  purpose 
of  the  experiment  was  to  determine  whether  a  new  automated  measurement  system, 
developed  at  the  Federal  Aviation  Administration  (FAA)  Technical  Center,  could 
differentiate  pilots  based  on  their  performance  during  simulated  flight. 

The  developmen,.  and  testing  of  this  measurement  system  was  stimulated  by  a  specific 
technical  program  —  the  Cockpit  Display  of  Traffic  Information  (CDTl).  This 
program  was  organized  to  explore  the  impact  of  traffic  information  displays  on 
aircrew  behavior.  However,  it  became  apparent  at  the  beginning  of  the  program  that 
current  measures  of  aircrew  performance  and  workload  were  inadequate.  This  led  to 
the  effort  described  in  this  report  to  create  the  Pilot  Performance  Index  (PPI) . 

Method:  The  PPI  was  developed  analytically  by  several  subject  matter  experts,  who 
ware  themselves  high— Lime  prlots.  The  basis  of  the  I'PI  involved  dividing  a  normal 
regime  of  flight  into  six  segments  (takeoff,  climb,  en  route,  descent,  initial 
approach,  and  final  approach)  and  then  identifying  variables  which  were  important 
for  the  successful  completion  of  each  segment,  such  as  airspeed,  heading,  and 
instaneous  vertical  speed,  for  the  climb  segment.  On  each  of  these  variables  an 
ideal  value  was  selected  based  on  the  operating  characteristics  of  the  aircraft.  A 
computer  automatically  sampled  the  aircraft  state  and  compared  obtained  values 
against  standards.  The  closer  the  two  sets  of  numbers  were,  the  higher  was  the 
pilots  performance  score.  This  technique  assumed  that  pilots  performance  could  be 
inferred  from  how  well  the  aircraft  was  performing  at  any  given  time. 

In  addition  to  the  PPI,  two  other  measures  were  designed  for  this  experiment.  A 
second  performance  measure  using  the  more  traditional  observer  ratings  was 
employed.  One  observer  rode  on  each  simulated  flight  and  two  others  made 
independent  observations  using  video  tapes  of  the  cockpit  instrument  panel. 
Finally,  aircrew  subjective  perceptions  of  workload  were  evaluated  using  an 
inflight  technique,  also  developed  at  the  FAA  Technical  Center,  and  a  postflight 
questionnaire. 

The  basic  research  employed  in  this  experiment  involved  selecting  two  diverse 
groups  of  pilots  and  determining  if  the  measures  would  separate  the  groups  in  terms 
of  performance.  The  first  group,  known  as  masters,  were  all  professional  pilots 
whose  medium  flight  time  was  6,075  hours.  The  second  group,  or  journeymen,  were 
relatively  new  instrument  pilots  (median  flight  time  of  only  161.5  hours)  who  had 
been  trained  in  another  FAA  program. 


V  Ll 


AIL  participants  were  volunteers.  They  each  flew  a  standard  instrument  "round- 
robin''  flight  plan  in  a  Singer-Link  General  Aviation  Trainer  or  GAT,  which 

simulated  a  Cessna  421  -  a  light  twin-engine,  cabin-class  aircraft.  The  simulator 

had  no  external  visual  capability  but  was  equipped  for  the  collection  of  digital 
aircraft  state  information  such  as  position  in  space,  airspeed,  heading,  etc. 
This  information  was  sampled  once  per  second  during  each  flight,  which  lasted 
approximately  35  minutes. 

Inflight  workload  was  collected  using  a  response  box  mounted  below  the  throttles. 
The  box  contained  ten  push  buttons  numbered  from  1  to  10.  The  buttons  were 
verbally  anchored  during  a  preflight  briefing  using  a  modification  of  the 
C :oper-Harper  technique. 

Results;  A  preanalysis  of  the  pilot  performance  index  was  employed  to  eliminate 
scales  within  flight  segments  which  failed  to  separate  the  two  groups  of  pilots. 
Since  none  of  the  scales  in  the  takeoff  segment  showed  any  performance  difference, 
the  entire  segment  was  deleted  from  further  analysis.  An  analysis  of  variance  was 
computed  across  the  segments  of  flight  and  across  the  two  replicated  flights. 
This  examined  the  relationship  between  the  two  pilot  groups.  The  analysis  showed 
that  the  masters  pilots  performed  consistently  better  than  the  journeymen  in  all 
segments  of  flight.  There  was  a  slight  tendency  for  both  groups  of  pilots  to 
improve  their  performance  across  the  two  flights.  The  PPl  appeared  to  function  as 
expected . 

The  performance  ratings  made  by  three  independent  observers  were  also  analyzed. 
The  le'.?el  of  agreement  between  raters,  an  index  of  measure  reliability,  was  high 
for  the  flight  segment  performance  scores,  exceeding  r  =  .90.  The  data  from  the 
three  raters  were  averaged  and  then  analyzed  using  the  analysis  of  variance 
technique.  There  was  again  a  clear  separation  between  the  two  pilot  groups,  with 
the  masters  doing  consistently  better. 

The  spread  ip  performance  scores  for  the  masters  pilots  was  considerably  greater  in 
the  PPI  data  than  it  was  for  the  observer  ratings.  The  observers  were  apparently 
less  able  than  the  automated  PPI  to  make  fine  discrimination  between  the  members 
of  the  fairly  homogeneous  masters  group.  There  was,  however,  a  great  deal  of 
variability  in  journeymen  scores  for  both  types  of  measures. 

The  pilot  performance  rating  totals  for  each  flight  correlated  very  well  with  the 
automated  performance  measures.  The  obtained  correlation,  was  r  =  .82,  indicating 
considerable  agreement  between  the  traditiuual  expert  opinion  results  and  those 
developed  by  the  newer  automated  techniques. 

Both  measures  of  workload,  the  inflight  techniques  and  the  postflight 
questionnaire,  showed  significantly  higher  reported  workload  for  the  journeymen 
pilots  than  foi  the  masters  pilots.  Correlations  between  measures  of  workload 
and  performance  produced  an  interesting  phenomenom.  When  all  pilots  were 
considered,  the  correlations  tended  to  be  negative  —  the  higher  the  workload,  the 
poorer  the  measured  performance.  The  journeymen  felt  that  they  were  working 
harder,  but  their  performance  (based  on  their  lack  of  experience)  did  not 
demunetrate  their  efforts. 


viii 


Conclusions:  11)  An  APM  System  called  the  PPI  was  successfully  tested,  and  it  did 
what  it  was  designed  to  do.  (2)  Both  the  automated  performance  measure  and  the 
observer  ratings  separated  the  two  pilot  groups  in  terms  of  performance.  (3)  The 
APM  System  was  better  able  than  the  observer  ratings  to  spread  the  performances. 
f4)  Masters  pilots  reported  consistently  lower  workload  and  produced  consistently 
better  overall  flight  performance  than  the  journeymen.  (.3)  An  inverse  relationship 
between  workload  and  performance  existed  with  the  journeymen  reporting  higher 
workload  but  demonstrating  poorer  performance. 


IX 


INTRODUCTION 


THE  PROBLEM. 

The  evaluation  of  operator  performance  has  been  a  major  problem  for  system 
development.  It  has  become  apparent  that  the  more  complicated  the  system,  the 
more  difficult  it  is  to  measure  performance.  The  advent  of  aviation  has  generated 
a  significant  number  of  questions  concerning  person-machine  relationships  and 
performance  criteria. 

The  first  large-scale  selection  of  pilots  occurred  during  World  War  I.  At  that 
time,  methods  for  selection  and  training  performance  evaluation  had  to  be 
established  quickly.  This  was  the  beginning  of  the  identification  of  a  number  of 
problems  to  which  ideal  solutions  have  yet  to  be  found.  Pilots  must  operate  in  a 
highly  dynamic  environment  in  which  there  is  a  continuous  flow  of  constantly 
changing  demands  and  information.  Pilots  must  function  in  multiple  dimensions 
simultaneously.  These  factors  make  the  definition  and  measurement  of  performance  a 
very  difficult  task. 

Much  of  the  work  that  has  been  accomplished  on  aircrew  performance  has  focused  on 
the  military  training  environment  and,  to  some  extent,  on  the  operations  of  air 
transport  crews.  Very  little  has  been  done  to  develop  systematic  measures  for  the 
general  aviation  pilots,  who  are  numerous  in  the  airspace. 

This  current  research  report  describes  work  accomplished  by  the  Federal  Aviation 
Administration  Technical  Center's  Applied  Human  Factors  Program.  This  program 
developed  an  automated  performance  measurement  tool  as  part  of  the  Technical 
Center's  Airborne  Simulation  Facility.  This  tool  was  designed  so  that  it  could  be 
used  to  evaluate  the  impact  on  pilot  performance  of  future  systems  changes,  such  as 
equipment  modifications  and  new  air  traffic  control  procedures. 

The  balance  of  this  introduction  is  organized  into  seven  sections.  The  first  three 
discuss  why  performance  measurement  is  necessary  and  how  it  has  been  traditionally 
accomplished.  The  next  two  sections  review  some  of  the  background  history  of 
two  major  types  of  meaurement;  performance  rating  and  automated  performance 
measurement.  The  sixth  section  introduces  the  complexity  of  pilot  workload 
evaluation,  and  the  final  section  describes  the  immediate  goals  of  this  research 
work . 

REASONS  FOR  PERFORMANCE  MEASUREMENT. 


Throughout  the  history  of  aviation,  there  have  been  many  varied  efforts  to  evaluate 
the  performance  of  pilots  in  flight.  The  two  primary  purposes  for  the  majority  of 
these  efforts  have  been  for  training  and  certification.  According  to  Farrell 
(1973),  tests  of  pilot  performance  have  existed  fur  over  50  years.  The  measurement 
of  performance  on  complex  tasks  in  a  practical  manner  is  a  major  problem 
(Povenmire,  Alvarres,  &  Damos,  (1970)).  Early  trainers,  however,  rediscovered  a 
basic  principle  of  learning  —  knowledge  of  results  through  feedback  improves 
performance.  This  means  that  training  can  be  more  cost-effective  and  marginal 
trainees  can  be  screened  out  early  in  the  program. 


I 


Early  efforts  to  examine  training  performance  were  very  basic  and  usually  involved 
little  more  tnan  the  instructor's  judgment.  The  requirements  for  certification  of 
pilots  increased  the  need  for  performance  standards  and  measures.  Prior  to 
World  War  II,  the  Civil  Aeronautics  Administration  attempted  to  develop  an 
objective  pilot  rating  scheme  under  the  Civilian  Pilot  Training  Program  (North  and 
Griffin,  1977).  This  effort  failed  because  the  procedures  were  too  costly  and  time 
consumin’  to  administer. 

During  the  World  War  IT.,  the  selection  and  training  of  pilots  in  large  numbers 
agai..  became  a  major  undertaking.  This  also  led  to  early  concepts  of  person- 
machine  interface  and  anticipated  systems  design.  Research  workers  leaving  the 
military  at  the  end  of  the  war  began  exploring  human  performance  as  an  indicator 
of  equipment  design  adequacy.  Por  example,  Obermeyer  and  Vreuls  (1974)  viewed 
measurements  as  a  bridge  between  training  and  operational  situations.  Modern 
systemj  approaches  require  a  concern  not  only  for  hardware  but  also  for  the  people 
who  must  operate  ic.  In  order  to  properly  evaluate  new  systems,  procedures  and 
concepts,  a  determination  of  operator  performance  in  a  person-machine  system 
becomes  essential.  This  fosters  an  examination  of  those  variables  which  influence 
performance.  Equipment  is  becoming  increasingly  reliable  and  the  weak  link  in  any 
person-machine  system  is  often  the  human  operator  (Ro.>coe,  1978). 

WHAT  IS  PERFOPMANCE  MEASUREMENT? 

Skjenna  (1981)  noted  that  one's  worst  judge  is  oneself,  especially  when  it  comes  to 
performance.  Individuals  who  feel  they  have  conventional  wisdom  (the  ultimate 
truth)  baaed  on  their  experience  with  a  system  may  well  be  incorrect  and  may  likely 
draw  erroneous  conclusions  about  performance  (Poulton,  1975). 

Before  atiy  measurement  can  be  accomplished,  two  things  are  required.  The  first  is 
acceptance  of  che  idea  that  employing  a  measurement  philosophy  is  superior  to 
making  decisions  based  or’  individual  judgment  alone.  In  a  research  environment, 
there  is  really  no  alternative  if  adequate  precision  is  to  be  achieved.  The  second 
requirement  is  a  definition  of  whatever  it  is  that  must  be  measured.  Although 
"performance"  has  been  used  as  if  it  were  a  universally  accepted  term,  in  reality 
it  is  not.  Gerathewohl  (1978)  made  the  distinction  between  performance  and 
proficiency.  Perfrrmanco  referred  to  the  execution  of  an  action  of  more  or  less 
specific  function,  such  as  pulling  a  lever  or  throwing  a  switch.  Proficiency,  in 
contrast,  was  related  to  the  integration  of  a  multiple  actions.  This  integration 
its.'’:lf  was  thought  to  he.  a  desirable  quality  of  a  safe  pilot. 

Whichever  term  is  used,  performance  or  proficiency,  it  implies  th.at  an  operator  or 
a  person-machine  system  accomplishes  specific  behaviors  or  tasks  under  certain 
restraining  conditions.  The  evaluation  of  performance  involves  the  examining 
behavir.r  over  a  period  of  time  and  comparing  accomplishment  to  a  set  of  evaluative 
standards  (Vroom,  1964),  The  determination  of  these  standards  is  a  major  problem 
in  any  measurement  scheme.  This  has  become  known  in  industry  and  education  by  the 
phrase,  "criterion  problem."  Several  alternatives  offered  by  Berliner,  Ar.gell 
and  Shearer  (1964)  have  included  the  comparison  against  the  performance  of  others, 
a  normative  approach,  and/or  against  the  achievement  of  known  experts,  i.e.,  master 
pilots.  Another  alternative  is  to  establish  an  absolute  standard  of  satisfactory 
performance  against  which  to  compare  individual  behavior.  Conolly,  Shuler,  and 
Knoop  (1969)  described  three  types  of  models  which  might  be  useful  for  the 
derivation  of  a  unique  set  of  performance  measures.  These  included:  (1)  state 


2 


transfer  measures  based  on  the  overall  trends  in  behavior,  (2)  absolute  measures, 
where  performance  is  compared  with  a  standard,  and  (3)  relative  measures,  which  are 
based  on  the  relationship  of  other  measures. 

Measurement  is  further  complicated  by  the  multidimensional  nature  of  the  cockpit 
environment.  Ihe  various  approaches  to  classifying  these  dimensions  will  be 
discussed  later.  Not  only  are  a  pilot's  tasks  multidimensional,  but  also  his 
or  her  skill  (the  degree  to  which  proficiency  has  been  attained)  can  vary  across 
tasks  (e.g.,  co  .imunicat  ion  and  navigation)  and  across  time  (Farrell,  1973). 
Pilots,  being  human,  do  not  always  perform  consistently  at  their  highest  skill 
level.  Fleishman  (1967)  pointed  out  that  seldom  is  a  measurement  system  applicable 
to  more  than  the  specific  setting  for  which  it  was  designed.  This  is  a  particular 
problem  in  research  because  each  setting  is  often  unique  to  the  current  research 
question.  Roscoe  (1978)  lamented  that  it  was  really  unfortunate  that  the  hinnan 
pilot  could  not  be  measured  with  the  same  precision  as  a  mechanical  system.  This, 
however,  is  still  not  currently  state-of-the-art. 

Several  researchers  have  attempted  to  define  standards  for  performance  measurement 
systems  used  in  aviation.  It  could  be  said  that  measures  have  traditionally  varied 
on  two  continua:  (1)  objective  -  subjective  and  (2)  quantitative  -  qualitative. 

Objective  performance  measurement  usually  involves  the  use  of  identifiable 
standards  against  which  to  compare  the  observed  behavior.  The  more  subjective  a 
measure  is,  the  more  dependent  it  becomes  on  an  observer's  internalized  model  or 
construct  concerning  what  performance  should  be.  The  second  continuum  refers  to 
the  assignment  of  numbers  to  performance  in  a  systematic  way  which  reflects  the 
quality  of  the  performance.  A  completely  qualitative  evaluation  uses  no  numbers  at 
all,  while  a  completely  quantitative  approach  employs  numbers  exclusively.  Both 
continua  interact  in  terms  of  measurement  philosophy.  Performance  evaluation  can 
be  both  quantitative  and  subjective.  For  example,  this  would  occur  when  using  a 
performance  racing  system  where  standards  are  not  employed.  With  the  inclusion  of 
observable  standards,  the  measure  moves  somewhat  Coward  the  objective  end  of  the 
cont inuum. 

Research  workers  are  divided  concerning  the  relevance  of  the  different  types  of 
measures.  Poulton  (1975)  felt  that  objective  measures  should  be  used  whenever 
possible  but  accepted  that  objective  measurement  in  the  purest  sense  is  not  always 
possible.  Gerathewohl  (1978a)  indicated  that  a  multivariate  method  was  best, 
which  maximized  the  advantages  of  a  number  of  different  types  of  techniques, 
virtually  everyone  in  research  accepts  the  need  for  quantification  and  some  level 
of  objectivity.  Without  these  elements,  measures  are  unlikely  to  be  reliable  and 
valid , 

Reliability  refers  to  both  the  internal  consistency  of  a  measure  and  its  tendency 
to  measure  consistently  over  time.  Validity,  in  contrast,  is  the  degree  to  which 
the  measure  accurately  evaluates  whatever  it  was  designed  co  evaluate.  For 
example,  a  pilot  performance  measure  which  is  unduly  influenced  by  irrelevant 
factors  might  be  said  to  be  invalid. 


3 


In  addition  to  reliability  and  validity  as  criteria  for  effective  pilot  perforirance 
measurement,  Farrell  (1973)  has  included  ease  of  use,  diagnostic  value,  safety  and 
cost.  McDowell  (1978)  felt  that  measures  should  also  be  interpretable,  invariant 
with  respect  to  time,  immediately  available,  invariant  with  respect  to  the 
instruments  used  to  collect  them,  and,  finally,  task  relevant.  Vreuls  and 
Obermayer  (1973)  noted  that  aircrew  performance  involves  a  great  deal  of 
continuously  varying  information.  The  advent  of  cockpit  automation  further 
complicates  the  situation  and  requires  very  clear  definitions  of  what  measures  are 
to  be  used  and  under  what  conditions.  Vreuls  and  Obermayer  (1973)  indicated  that 
there  are  several  alternatives  for  the  definition  of  measures.  These  range 
from  an  analytical  "armchair"  method  based  on  a  literature  survey  and  experience  to 
actual  observation  and  measurement  in  the  cockpit  in  order  to  pretest  candidate 
techniques . 

Before  any  of  this  can  begin,  a  description  of  what  it  is  pilots  do  in  the  cockpit 
must  be  developed.  From  this  description  will  evolve  both  measures  and  performance 
standards  or  criteria.  This  brings  us  to  attempts  to  classify  pilot  behavior. 

BEHAVIOR  CLASSIFICATIQN/TAXONOMV . 

Because  flying  involves  so  many  different  kinds  of  behaviors,  a  classification 
system  is  essential  if  measurement  is  to  be  accomplished.  Taxonomy  is  the  science 
of  how  to  classify  and  identify.  According  to  Fleishman  (1982),  many  differences 
in  the  research  results  across  performance  studies  may  have  been  caused  by 
variability  in  taxonomic  systems.  A  primary  purpose  for  classification  in  science 
is  to  clarify  a  description  of  relationships  between  objects  or  events  and  allow 
general  statements  about  classes  or  taxons  of  events.  A  problem  which  has  occurred 
in  aviation  human  factors,  as  well  as  in  the  study  of  other  person-machine  systems, 
is  that  classification  has  often  been  accomplished  without  due  regard  to  the 
consistency  of  the  rules  for  assigning  behaviors  to  categories.  Many  categories 
(e.g,,  thinking,  motor  responses)  are  too  general,  while  other  categories  , 

pilot  rotating  knob  A)  that  are  derived  from  a  detailed  task  analysis  are  too 
specific  to  be  of  practical  use  for  performance  evaluation  in  a  complex  system. 

In  aviation,  behavioral  taxonomies  have  varied  considerably  in  terms  of  their 
specificity.  Christensen  and  Mills  (1967)  classified  behavior  into  four 
categories:  perceptual  processes,  mediational  processes,  communication,  and  motor 
processes . 

Sheridan  and  Simpson  (1979)  stated  that  there  weic  four  msir.  classes  of  pilot 
behavior:  communication,  navigation,  guidance,  and  aircraft  systems  monitoring  and 
management . 

These  authors  also  described  certain  characteristics  of  flight  tasks  in  general. 
Tasks  often  arrive  randomly  and  may  or  may  not  be  expected  by  the  pilot.  Tasks 
vary  in  terms  of  priority,  and  some  may  be  deferred  while  others  are  not.  Finally, 
some  discrete  tasks  may  have  to  be  performed  in  a  specific  sequence. 

Classification  systems  have  contained  categories  described  by  general  behavioral 
terms,  such  as  those  of  Engel  (1970).  His  list  included  visual  discrimination, 
auditory  discrimination,  manipulation,  decisionmaking,  symbolic  data  operation,  and 
reporting.  These  systems  have  also  included  taxonomies  which  were  very  specific  to 
the  aviation  world.  Shannon  (1980a, b)  divided  his  system  into  two  general  areas, 


4 


continuous  and  discrete  operations.  The  former  referred  to  such  behaviors  as 
maintaining  altitude,  airspeed,  and  heading  while  the  latter  included  planning  and 
anticipating  flight  status  changes  and  making  the  appropriate  corrections.  Shannon 
(1980a)  felt  that  the  key  aspects  of  pilots  performance  were  basic  airwork, 
physical  coordination,  scan  pattern,  the  ability  to  plan  ahead,  time-sharing 
across  tasks,  and  handling  what  he  referred  to  as  "workload  stress." 

Gerathewohl  (1978b)  summarized  a  variety  of  taxonomies.  He  stated  that  a 
flight  task  analysis  could  occur  anywhere  on  a  continuum  from  molecular  to  molar. 
Combining  a  number  of  these  taxonomies,  the  author  established  what  he  thought  were 
the  major  tasks  of  flight:  mission  and  flight  planning;  takeoff  and  departure; 
cruise,  flight  anl  mission  operations;  emergency  procedures;  and  termination  of 
the  flight. 

Gerathewohl  (1978a)  saw  a  place  for  both  a  generic  type  of  taxonomy  using  terms 
such  as  sensorimotor  coordination  and  motivation  and  for  the  flight  specific 
classification  which  focuses  on  overt  pilot  behavior.  This  latter  approach  is 
particularly  relevant  for  a  relatively  new  measurement  approach,  Automated 
Performance  Measurement  (APM),  which  will  be  discussed  later. 

This  section  has  attempted  to  show  that  the  classification  of  aircrew  behavior  has 
direct  measurement  implications.  There  is  currently  no  generally  accepted  taxonomy 
and  each  is  usually  created  for  a  specific  purpose.  The  research  to  be  described 
in  the  method  section  of  this  report  has  followed  this  tradition,  selecting  a 
classification  scheme  appropriate  to  the  immediate  need. 

The  next  two  sections  of  this  introduction  will  describe  the  background  in  the 
research  literature  of  two  general  classes  of  measurement  on  the  objective- 
subjective  continuum.  This  will  include  performance  rating  and  automated 
performance  measurement. 

PERFORMANCE  HATING. 

Rating  scales  and  checklists  have  been,  by  far,  the  most  popular  evaluative  tools 
for  cockpit  performance.  Racing  techniques  using  a  human  observer  have  both 
advantages  and  liabilities.  Knoop  and  Welde  (1973)  saw  a  need  for  observer  data 
even  if  more  objective  data  were  available.  Some  behaviors,  they  felt,  do  not  lend 
themselves  to  automated  type  scoring.  These  include  decisionmaking,  planning, 
confidence,  and  time  sharing.  Povenmire,  Alvarres,  and  Damos  (1970)  emphasized  the 
practicality,  simplicity,  and  low  cost  of  rating  procedures  if  they  could  be  made 
adequately  reliable.  Leibowitz  and  Post  (1982)  described  the  unique  capabilities 
of  the  human  observer.  The  observer  can  integrate  complex  stimuli  which  may 
involve  judgment  features  that  are  impossible  to  preprogram  into  a  mechanical 
system.  Further,  the  observer  can  differentiate  the  relevant  from  the  irrelevant. 
McDowell  (1978)  viewed  performance  rating  as  particularly  useful  in  a  training 
environment  but  questioned  its  effectiveness  in  research,  where  more  precision  is 
required . 

Because  performance  ratings  are  so  easy  to  develop,  or  appear  to  be  on  the  surface, 
they  have  traditionally  been  unreliable  and  have  had  little  more  than  face  (the 
appearance  of)  validity.  There  are  a  number  of  sources  of  variance  in  the  ratings 
which  have  little  to  do  with  performance.  These  include,  but  are  not  limited  to, 
observer  biases,  skill  variability,  internalized  standard  variability,  and  observer 


5 


expect  at  ions .  Often  ratings  are  developed  without  an  adequate  description  of  the 
behavior  to  be  evaluated.  The  importance  of  an  effective  taxonomy  cannot  be 
overstated.  Poulton  (1975)  cautioned  that,  when  ratings  were  employed,  they  should 
be  focused  on  specific  task  performance  rather  than  on  general  behavior. 

There  have  been  a  number  of  attempts  to  develop  reliable  pilot  performance  ratings. 
For  example,  Povenmire  et  al .  (1970)  worked  with  the  Illinois  Private  Pilot  Flight 
Performance  Scale.  This  is  a  five-point  scale:  S-superior ,  A-pasaing,  3- just 
barely  below  passing,  2-well  below  passing,  and  1-failure.  They  used  this  scale 
to  evaluate  student  pilot  performance  in  a  flight  simulator.  Twenty  maneuvers 
described  in  the  Federal  Aviation  Administration's  (FAA's)  "Private  Pilot  Test 
Guide"  were  employed  in  their  experiment.  What  made  their  approach  unique  for  its 
time  was  the  way  they  developed  standards.  They  had  a  group  of  instructor  pilots 
write  performance  descriptions  for  each  point  on  the  five-point  scale  of  all  the 
maneuvers.  Three  levels  of  student  experience  were  sampled;  15,  25,  and  35  flight 
hours.  Results  indicated  pilot  performance  improvement  across  the  three  levels. 
More  importantly,  the  interrater  reliabilities  between  the  two  independent  laLcrs 
ranged  from  r  =  .45  to  r  =  .82.  The  higher  end  of  the  range  was  quite  acceptable. 
However,  one  cannot  ignore  the  low  end  of  r  =  .45,  which  is  not  unusual  when  using 
rating  techniques. 

There  have  been  some  observer-based  performance  evaluation  projects  which  have 
moved  beyond  traditional  rating  techniques  and  may  serve  to  bridge  the  logical  gap 
between  rating  and  APM.  Melton,  McKensie,  Kellin,  Hoffman,  and  Saldivar  (1975) 
were  concerned  with  the  evaluation  of  pilot  behavior  in  a  general  aviation  trainer. 
They  mounted  a  still  camera  where  it  was  focused  on  the  instrument  panel  of  the 
simulator.  A  series  of  photographs  was  taken  while  pilots  flew  climbs,  descents, 
turns,  and  straight  and  level  segments.  Deviations  from  assigned  values  for 
airspeed,  altitude,  and  heading  were  manually  extracted  from  the  photographs 
sometime  after  the  flights.  In  contrast,  Childs  (1979)  developed  a  criterion 
referenced  performance  scoring  procedure  for  Army  helicopter  pilots.  This  too  was 
observer  based,  but  was  accomplished  by  an  instructor  pilot  in  real-time  during 
flight  simulation.  The  observer  was  required  to  record  specific  instrument  values 
at  a  prescribed  sampling  rate.  The  limiting  factor  in  this  technique  was  the 
ability  of  the  observer  to  process  all  the  information  required  and  maintain 
accurate  records.  Damos  and  Lintern  (1981)  used  a  similar  procedure.  Instead  of 
recording  actual  instrument  values,  observers  assigned  scale  values  from  0-3  for 
each  variable  based  on  deviations  from  bank,  altitude,  rollout,  heading,  and 
UTl-t-Cxla  wc  re  employed  for  specific  levels  of  deviation  from  standards 
(i.e.  cruise  at  165  ±10  which  might  only  rate  a  scale  value  of  2). 

These  last  three  studies,  although  observer  based,  shared  certain  things  in 
common  with  APM.  They  were  quantitative  and  leaned  toward  the  objective. 
They  also  shared  a  basic  assumption  with  APM.  This  assumption  is  that  the 
state  of  an  aircraft  at  any  point  in  time  while  in  flight  is  a  direct  reflection 
of  the  performance  of  the  individual  who  is  flying  it.  This  is  an  over¬ 
simplification  because  sudden  deviations  in  flight  state  induced  by  weather  and 
other  uncontrollable  factors  must  be  taken  into  account.  On  the  average,  though, 
flight  status  and  aircrew  performance  are  assumed  to  be  completely  linked. 


6 


AUTOMATED  PERFORMANCE  MEASUREMENT. 


The  use  of  APM  has  been  a  relatively  recent  innovation  in  pilot  performance 
research.  Fuller,  Wagg,  and  Martin  (1980)  noted  that  the  United  States  Air  Force 
began  a  developmental  program  in  1968  aimed  at  the  design  of  objective  measures 
of  performance.  As  indicated  earlier,  APM  is  based  on  assumptions  that  flying 
performance  has  characteristics  which  are  reflected  in  certain  parameters.  These 
include  but  are  not  limited  to;  maintaining  the  aircraft  state  within  limits, 
avoiding  excessive  rates  and  acceleration  forces  so  that  maneuvers  are  smooth, 
flying  with  minimum  effort  and  avoiding  overcontrol,  and  not  exceeding  procedural 
or  safety  limits.  APM  has  been  characterized  by  both  simulation  and  inflight 
studies  with  researcher  preference  leaning  toward  simulation.  As  Knoop  and 
Welde  (1973)  commented  concerning  their  efforts  to  automate  performance  data 
collection  in  the  T-37  aircraft,  "It  is  not  easy  to  collect  good  inflight 
performance  data  (p.  235)." 

APM  by  definition  requires  the  use  of  computers  to  collect  performance  data 
concerning  aircraft  state  and/or  control  input  parameters.  Once  the  data  are 
collected,  they  can  be  compared  against  standards  which  have  been  developed  either 
analytically  or  empirically.  The  advantages  of  such  a  system  are  obvious.  The 
computer  is  completely  objective  and  can  process  a  great  deal  of  information 
rapidly.  However,  the  researcher  is  left  with  the  criterion  problem  because 
somehow  the  standard  values  still  have  to  be  developed.  Also,  the  computet  does 
not  "see"  everything  and  can  only  process  what  it  has  been  programmed  to  process. 
Farrell  (1973)  has  noted  that  APM  measures  deviations  from  standards  but  does  not 
interpret  the  significance  of  the  resultant  scores.  A  number  of  researchers  have 
cautioned  that  performance  ratings  should  not  be  discarded  even  if  APM  becomes  a 
well  articulated  discipline,  which  it  currently  is  not. 

While  there  have  been  several  reasonable  reviews  of  the  APM  literature,  which  is 
still  fairly  limited,  a  brief  summary  of  this  work  will  be  accomplished  here  so 
that  the  reader  can  become  familiar  with  this  type  of  research.  The  reader  is  also 
referred  to  Cerathewohl  (1978a)  and  Fuller,  Wagg,  and  Martin  (1980), 

Henry,  Turner,  and  Matthie  (1974)  described  what  must  have  been  an  early, 
low-budget  APM  study.  They  designed  a  measurement  system  built  primarily  around 
surplus  equipment.  This  system  centered  on  an  old  Link  8  computet  which  produced 
a  punched  paper  tape  as  a  data  record.  Aircraft  status  was  compared  against 
standards  surrounded  by  threshhold  data  "windows."  Scores  were  determined  by 
using  analog  information  and  voltages  representing  key  variables  (altitude, 
airspeed,  heading,  vertical  velocity,  turn  rate,  and  turn  coordination).  These 
were  compared  against  standard  voltages.  The  system  was  used  to  demonstrate 
decreased  performance  when  pilots  ingested  alcohol. 

Hill  and  Goebel  (1971)  also  used  the  Link  8  computer,  but  no  mention  was  made  of 
paper  tape.  Using  a  General  Aviation  Trainer  (GAT  1),  they  collected  data  on 
eight  basic  flight  variables  that  they  managed  to  process  into  266  measures,  many 
of  which  were  highly  correlated.  Three  groups  of  participants  flew  preeatablished 
flight  segments.  The  three  groups  included  one  with  no  experience,  one  with 
25  to  50  hours  of  flight,  and  one  whose  members  averaged  over  100  hours.  The 
obj  ect  of  the  study  was  to  determine  if  the  automated  performance  measures  would 
discriminate  across  the  three  groups.  Results  indicated  that  27  of  the  measures 


7 


would  discriminate.  However,  the  authors  were  unable  to  cross  validate  their 
results  in  a  second  similar  experiment.  Part  of  the  problem  may  have  been  the 
relatively  high  number  of  variables  and  small  number  of  participants,  ten  in  each 
group. 

This  brings  out  a  problem  seen  in  many  APM  studies.  One  can  easily  collect  a  great 
amount  of  data  with  only  a  small  number  of  participants.  This  has  created  a 
considerable  statistical  problem  when  attempting  to  analyze  the  results  in  a 
meaningful  way. 

Vreuls  and  Obermayer  (1974)  began  with  a  candidate  set  of  864  measures  for  a 
simulator  called  the  Jaycopter.  Recognizing  that  the  measure  set  had  to  be 
reduced,  they  favored  using  multiple  discriminant  analysis  across  groups  of  pilots 
who  were  preselected  based  on  experience  as  in  the  Hill  and  Goebel  (1971)  study. 
Vreuls  and  Obermayer  found  in  their  Jaycopter  work  that  control  imput  variables 
appear  to  provide  the  best  discriminations. 

Hill  and  Eddowes  (1974)  felt  that  a  reanalysis  of  the  Hill  and  Goebel  data  was 
necessary.  By  processing  the  variables  they  had  originally  collected,  they  arrived 
at  2,436  separate  measures  of  flight  performance.  They  then  attempted  to  reduce 
this  set  by  using  several  statistical  procedures,  including  analysis  of  variance 
and . discriminant  analysis  (note  that  both  of  these  procedures  will  be  examined  in 
the  results  section  of  this  report).  The  authors  were  able  to  reduce  the  measure 
list  down  to  a  subset  of  420  which  discriminated  across  the  three  experience  levels 
of  participant  pilots.  However,  they  concluded  that  approaching  a  measurement  pool 
statistically  was  not  a  practical  method.  The  resultant  discrimination  functions 
were  less  than  perfect  in  correctly  classifying  pilots  into  experience  groups  based 
on  measured  performance, 

McDowell  (1978)  also  found  that  classification  was  less  than  he  would  have  liked 
using  APM  in  an  Advanced  Simulator  for  Pilot  Training  (ASPT)  which  simulated 
the  T-37  aircraft,  McDowell  studied  three  levels  of  T-37  pilots:  preflight, 
postflight,  and  instructor  pilots.  He  focused  on  control  input  variables.  He 
found  in  the  instrumented  ASPT  that  "for  simple  undemanding  maneuvers,  novice 
pilots  behave  generally  like  more  experienced  pilots  (p.  31)."  McDowell  had  a 

small  number  of  participants,  ten  in  each  group,  but  limited  his  principle  analyses 
to  36  composited  control  input  variables.  On  the  more  difficult  maneuvers,  some 
of  the  variables  were  useful  in  separating  the  three  experience  groups  with  an 
accuracy  or  80  to  90  percent . 

The  studies  using  APM  which  have  been  cited  here  are  a  sample  of  the  work  that  has 
been  accomplished.  They  vary  in  terms  of  technical  sophistication  and  measurement 
orientation.  Some  examine  aircraft  state  as  the  primary  indicator  of  performance, 
while  others  are  concerned  with  control  input  variables.  In  some  cases  this 
orientation  may  be  due  to  the  equipment  that  is  on  hand  and  the  magnitude  of 
the  budget  for  hardware  and  software.  What  all  APM  studies  share  is  the  use  of 
automation  in  a  drive  for  greater  objectivity  and  reliability  of  pilot  performance 
measurement . 


PILOT  WORKLOAD. 


Workload  is  a  construct  which  is  directly  related  to  aircrew  performance  no  matter 
how  you  measure  it.  Like  performance,  workload  is  viewed  as  multidimensional  in 
character,  and  there  is  no  one  centrally  agreed  upon  definition.  Moray  (2982)  has 
summarized  the  literature  in  "mental  workload"  and  has  noted  that  modern  automation 
has  reduced  much  of  the  physical  exertion  involved  in  operating  complex  modern 
control  systems.  Rault  (1979)  has  stated  chat  "a  pilot  performs  well  and  sometimes 
even  better  as  he  is  asked  to  do  more  and  more  and  suddenly  he  is  overloaded  and 
breaks  dowa  (p.  418)."  This  is  an  oversimplification  except  in  extreme  cases. 
However,  how  hard  a  pilot  or  crew  is  working  may  in  fact  influence  a  performance 
in  more  subtle  ways  than  producing  a  complete  breakdown.  Traditional  workload 
measurement  has  depended  on  the  postflight  questionnaire,  often  modeled  after  the 
now  famous  Cooper-Harper  Scales.  Postflight  questionnaires  have  the  liability  of 
being  very  memory  dependent  and  do  not  take  into  account  the  ebbs  and  flows  of 
workload  during  the  course  of  a  normal  flight. 

There  have  been  several  recent  studies  conducted  at  the  FAA  Technical  Center  in 
Atlantic  City  which  take  a  somewhat  different  app’-oach  to  aircrew  workload. 
Rosenberg,  Rehmann ,  and  Stein  (1982)  examined  workload  as  a  wholistic  operator 
response.  They  asked  participants  who  were  performing  a  two-axis  Cracking  task  to 
respond  every  minute  to  a  query  tone  by  pushing  a  workload  button.  Ten  buttons 
were  arrayed  under  the  participants'  nontracking  hand.  The  participants  were  asked 
to  press  the  button  from  1  (very  easy)  to  10  (very  hard)  which  best  described  how 
hard  they  were  working.  Reported  workload  correlated  very  well  with  four  levels  of 
objectively  determined  task  difficulty.  In  another  study  performed  in  a  GAT, 
participants  reported  workload  which  was  directly  related  to  flight  difficulty  as 
determined  by  turbulance  and  air  traffic  control  (Ccein  and  Rosenberg,  1983). 
Unfortunately,  no  direct  performance  data  collection  was  accomplished  during  this 
study.  There  have  been  very  few  studies  which  have  examined  both  performance  and 
workload.  None  have  employed  the  method  for  workload  assessment  just  described. 

Brictson,  McHugh,  and  Naitoh  (1974)  evaluated  pilot  carrier  landing  performance  in 
relation  to  workload.  How  they  evaluated  performance  was  unclear,  but  workload  was 
defined  in  terms  of  the  average  number  of  hours  flown  in  the  previous  week,  the 
number  of  prior  consecutive  years  of  flying,  and  the  relative  danger  of  the 
missions  flown.  For  each  of  three  levels  of  workload,  they  identified  landing 
performance  predictor  variables.  For  low  workload,  it  was  the  pilot's  accident 
history  for  the  past  2  years.  For  moderate  workload,  it  was  experience  in  the 
aircraft,  the  F-4,  which  they  flew.  For  high  workload,  the  best  performance 
predictor  was  the  pilot's  blood  chemistry.  However,  under  high  workloads  as  they 
defined  it,  the  researchers  found  that  the  prediction  was  no  longer  accurate. 

Smith  (1979)  studied  the  performance  of  three-person  air  transport  crews  under 
simulated  flight.  He  reported  a  larger  error  rate  as  the  difficulty  of  the 
flight  was  increased.  The  data  analysis  was  primarily  descriptive  rather  than 
statistical,  and  the  number  of  participants  was  very  small. 


9 


The  interaction  of  workload  and  performance  is  an  important  concern,  and  the 
literature  in  aviation  does  not  do  it  justice.  The  demands  placed  upon  the 
aircrew,  coupled  with  their  internalized  model  of  what  performance  should  be, 
will  interact  with  their  skills  to  produce  a  given  performance  level.  This 
level  will  be  influenced  further  by  a  host  of  variables,  such  as  weather,  to 
complicate  matters.  To  the  extent  that  there  is  any  agreement  at  all  concerning 
the  aviation  human  factors  that  influence  performance  and  workload,  it  would  focus 
on  their  dyneunic  and  thoroughly  complex  nature. 

RESEARCH  GOAL. 

This  current  research  was  designed  to  support  the  development  and  initial 
evaluation  of  an  APM  System  for  use  in  evaluating  the  impact  of  cockpit  and 
airspace  system  changes  on  pilot  performance  and  workload.  The  goal  was  to  make 
the  most  of  what  was  available  in  terms  of  hardware  and  software  at  the  FAA 
Technical  Center's  Airborne  Simulation  Facility.  The  APM  System  known  as  the  Pilot 
Performance  Index  (PPI)  was  to  be  tested  by  demonstrating  that  it  could  at  least 
discriminate  between  two  groups  of  pilots  who  should  perform  differently  based  on 
their  divergent  experience.  A  subordinate  goal  of  this  study  was  to  attempt  to 
find  a  relationship  between  the  workload  measures  previously  developed  at  the 
Technical  Center  and  the  new  performance  measure,  the  PPI. 


METHOD 


RESEARCH  DESIGN. 

The  objective  of  this  study  was  to  determine  whether  or  not  a  new  measurement 
system  could  functionally  differentiate  pilots  based  on  their  inflight  performance. 
This  was  to  be  the  first  experiment  in  a  series,  and  the  design  was  developed  to 
demonstrate  what  to  a  lay  individual  might  seem  obvious.  Logically,  it  would  seem 
that  pilots  who  differed  drastically  in  experience  should  perform  differently  in 
the  air.  If  the  measures  could  not  discriminate  between  high-time,  professional 
pilots  and  relatively  new,  barely  qualified,  instrument  pilots,  then  they  certainly 
would  never  work  to  make  finer  grained  discriminations  induced  by  systems  or 
procedural  changes . 

Tilt  basic  design  employed  a  grouping  variable  which  involved  the  selection  of 
pilots.  Half  were  high-time  test  pilots,  and  the  other  half  had  just  received  an 
instrument  rating.  Each  pilot  flew  the  same  flight  plan  under  the  same  conditions 
twice.  This  was  to  evaluate  test-retest  measurement  reliability.  During  data 
analysis  the  design  will  be  further  refined  by  breaking  each  flight  into  segments, 
but  basically  there  were  two  independent  variables,  pilot  group  and  flight. 
Dependent  variables,  or  in  other  words  those  on  which  measures  were  collected, 
could  be  classified  into  four  groups.  The  first  were  those  measures  collected 
automatically  by  the  flight  simulator  system  and  consisted  of  aircraft  state 
variables.  The  second  group  of  measures  were  those  provided  on  performance  rating 
forms  by  three  independent  instructor  pilots.  The  third  set  of  variables  involved 
a  postflight  pilot  questionnaire.  The  final  variable  set  included  workload 
and  response  delay  measures  collected  every  minute  inflight. 


10 


The  experimental  design  was  rather  straightforward,  bwt  obviously  data  collection 
was  complex.  Details  of  how  this  design  was  administered  will  be  described  in 
subsequent  sections. 

PARTICIPANTS. 


Twenty-four  pilots  completed  this  experiment.  All  participants  were  locally 
acquired  volunteers,  who  were  employed  by  one  of  the  following  three  organizations: 
FAA  Technical  Center,  Flight  Inspection  Field  Office  (FIFO),  or  the  New  Jersey  Air 
National  Guard  177th  Fighter  Intercepter  Group. 

The  twelve  journeymen  (low-time)  pilots  all  held  private  instrument  ratings 
and  had  a  median  flight  time  of  161.5  hours  of  which  a  median  of  1A.5  hours 
had  occurred  in  the  last  3  months.  The  masters  (high-time)  pilots  all  had  air 
transport  (ATP)  ratings,  except  one  individual  who  held  a  commercial  ticket.  The 
masters  pilots  had  a  median  of  6,075  hours  flight  time  of  which  a  median  of 
62.5  hours  had  occurred  in  the  last  3  months.  Every  member  of  this  group  earned 
some  portion  of  his  living  through  aviation  as  a  pilot.  In  contrast,  none  of 
the  journeymen  were  professional  pilots.  They  all  had  been  trained  through  an 
experimental  FAA  program  designed  to  see  if  instrument  training  could  be  given  to 
pilots  with  less  than  200  hours  of  flight  time.  They  were  all  trained  by  the  same 
instructors  using  the  same  course  of  instruction.  It  was  fortunate  having  such  a 
relatively  homogenous  group  of  pilots  from  which  to  sample. 

All  participants  were  carefully  briefed  on  their  rights  to  informed  consent  and 
privacy.  All  data  collection  was  accomplished  by  participant  number,  and  names 
were  not  recorded  on  data  forms. 

EQUIPMENT. 

The  basic  unit  of  equipment,  upon  which  the  entire  experiment  focused,  was  the 
Singer-Link  General  Aviation  Trainer  (GAT  II).  The  FAA  lechi.ical  Center  GAT 
replicates  the  appearance  and  simulates  the  performance  of  a  Cessna  421,  a  cabin 
class  reciprocating  twin-engine  aircraft.  It  permits  instrument  flying  only  and 
has  no  visual  display  system.  It  is  mounted  on  a  motion  platform  having  2  degrees 
of  freedom  and  is  able  to  provide  vestibular  and  kinesthetic  pilot  cueing  for 
pitch,  roll,  and  to  a  certain  extent,  elevation  changes.  The  cockpit  is  equipped 
with;  Collins  FD  109  flight  director,  AP  106  autopilot,  twin  NAVCOMS,  transponder, 
autcmatic  direction  finder,  and  other  standard  instrumentation. 

The  GAT  was  equipped  with  one  special  feature  that  was  not  related  to  its  flight 
performance.  This  was  a  workload  response  box  which  was  mounted  just  below  the 
throttles  out  of  the  pilot's  primary  visual  scan.  It  contained  10  pushbutton 
switches  placed  in  a  semicircular  array  and  a  tone  alert  speaker.  At  the  center  of 
the  switch  array  was  a  red  light  emitting  diode,  which  was  turned  on  each  time 
there  was  a  query  tone  requesting  a  workload  response.  This  light  was  to  remain  on 
until  the  participant  pushed  any  button. 

This  hardware  is  driven  by  and  provides  inputs  to  several  computer  systems. 
An  analog/digital  system  computes  the  equations  of  motion,  < ontrols  the  tuot Ion 
platform,  and  drives  some  of  the  aerodynamic  information  displays.  Guidance 
processing  is  accomplisned  with  a  NAV  System  Simulation  Package  (NSSP) .  Data 
collection  for  both  aircraft  stats  variables  ana  pilot  workload  responses  was 
accomplished  by  a  Xerox  XDS  530  computer  which  stored  the  data  on  magnetic  tape. 


11 


Finally,  a  Digital  Equipment  Corporation  (DEC)  LSI-11  computer  served  multiple 
roles.  It  provided  flight  track  plotting,  which  was  available  during  each  flight 
and  was  observable  by  the  air  traffic  controller.  This  computer  also  served  the 
additional  task  of  providing  workload  query  tones  every  minute  to  the  pilot. 

The  final  element  of  equipment  in  this  experiment  was  the  instructor's  console. 
This  was  located  in  a  separate  room  from  the  simulator  and  provided  the  work 
station  for  the  air  traffic  controller.  This  console  has  a  repeater  panel,  which 
provides  a  portion  of  the  same  information  that  the  pilot  has  available.  It 
provides  control  over  the  atmospheric  environment  of  the  simulated  flight  and  over 
aircraft  systems  operations.  This  device  permits  simulated  flight  problems  and 
failures  to  be  induced,  and  conmunication  with  the  cockpit  can  be  used  to  provide 
air  traffic  control  (ATC)  influence. 

PROCEDURE . 

PILOT  TRAINING.  Every  participant  pilot  was  given  an  opportunity  to  become  very 
familiar  with  the  flight  simulator  and  particularly  with  its  instrumentation.  The 
project  pilot  developed  a  program  of  instruction  for  both  Che  master  and  journeyman 
pilots.  Lesson  plans  for  this  instruction  ace  presented  in  appendix  A.  Masters 
level  pilots  were  limited  to  1  hour  of  familiarization  training  while  journeymen 
who  had  considerably  less  experience  in  complex  aircraft  were  allowed  up  to  3  hours 
of  instruction.  The  training  pilot  was  advised  by  the  experimenter  to  ensure  that 
all  participants  could  complete  a  basic  multileg  instrument  flight.  All  training 
was  conducted  using  flight  geometry  in  the  vicinity  of  Atlantic  City,  New  Jersey, 
and  with  the  employment  of  standard  air  route  charts.  The  training  pilot  did  not 
find  it  necessary  to  screen  out  any  participants  for  poor  performance  prior  to 
actual  data  collection.  Participants  were  not  exposed  to  the  flight  plan  used  in 
the  experiment  during  the  training  phase. 

Training  was  accomplished  without  external  air  traffic  control.  The  training  pilot 
provided  flight  clearances  in  the  cockpit  as  required.  Training  was  accomplished  in 
increments  of  no  more  than  1  hour.  Prior  to  each  period,  the  training  pilot  read  a 
briefing  to  the  participant.  This  briefing  specified  the  standards  on  which 
performance  would  be  measured.  For  example,  the  participant  was  told  he/she  was 
expected  tc  hold  altitude  plus  or  minus  100  feet  and  airspeed  during  cruise  within 
5  knots.  The  training  briefing  is  provided  in  its  entirety  in  appendix  B. 

MEASURE  DEVELOPMENT.  Aircrew  performance  irivolvcs  a  large  masB  of  continually 
varying  information,  and  accurate  measurement  of  meaningful  variables  is  a  very 
real  problem.  Vreuls  and  Obermayer  (1973)  made  a  distinction  between  variables  and 
measures.  A  variable  is  any  source  of  information  which  can  take  on  multiple 
values  and  is  quantifiable.  In  the  case  of  an  instrumented  flight  simulator,  there 
are  often  more  variables  than  anyone  really  knows  how  to  manage.  A  listing  of 
those  variables  available  from  the  FAA  Technical  Center  GAT  is  provided  in 
appendix  C.  There  are  87  in  this  list,  not  all  of  which  are  currently  available. 
A  measure  differs  from  a  variable  in  that  it  is  either  a  variable  selected  from  the 
list  based  on  its  characteristics  or  it  is  a  composite  of  variables  which  together 
provide  certain  measuretnent  benefits.  Measures  may  be  chosen  either  analytically, 
empirically,  or  with  some  combination  of  the  two  (Vreuls  and  Obermayer,  1973). 


12 


The  prifpary  method  of  measure  selection  in  this  study  was  analytical.  Two  subject 
matter  experts,  who  w'^re  high-time  pilots,  reviewed  the  list  of  variables  available 
in  the  Technical  Center  GAT.  Two  criteria  were  used  for  selection  of  variables: 
significance  of  the  'variable  for  a  normal  regime  of  flight  and  its  estimated 
potential  for  separating  pilots  in  terms  of  performance.  Each  flight  was  divided 
into  six  Segments;  takeoff,  climb,  en  route,  descent,  initial  approach,  and  final 
approach.  Variables  were  assigned  to  each  segment  in  which  they  were  applicable. 
For  example,  in  the  takeoff  segment,  the  following  variables  were  listed;  heading, 
airspeed,  manifold  pressure,  revolutions  per  minute,  pitch  angle,  and  roll  angle. 
A  complete  listing  of  variables  within  each  flight  segment  is  provided  in  table  1, 

The  subject  matter  experts  selected  "windows"  or  standards  of  acceptable 

performance  around  an  ideal  standard  for  each  segment  of  flight.  These  selections 
were  based  on  experience,  the  FAA  instrument  flight-check  guide,  and  the  aircraft 
handbook  for  the  Cessna  421  which  the  GAT  simulates.  Each  time  a  variable  was 
sampled,  which  was  every  second,  the  computer  doing  data  reduction  would  assign  one 

of  •'hree  numbers  to  that  sample  -  if  within  the  inner  limits,  a  two  (2)  was 

assigned;  if  within  the  outer  limits  or  the  larger  window,  then  a  one  (1)  was 

assigned;  and  if  beyond  the  larger  window,  the  pilot's  performance  would  receive  a 
zero  (0).  This  method  of  coding  the  performance  data  greatly  simplified  analysis 
because  a  great  deal  of  variability  was  discarded.  The  trichotomization  of  each 
sampled  performance  would  also  serve  to  smooth  the  effects  of  outlying  performances 
by  participant  pilots.  The  PPI  consisted  of  segments,  variables,  and  windows. 

It  will  be  noted  that  no  segment  of  flight  was  established  for  turns.  This  was  an 
oversight  that  will  have  to  be  corrected  in  the  future.  However,  Lurns  we  e 
covered  by  a  series  of  rating  scales  developed  for  "inflight"  use  and  also  for 

postflight  video  tape  evaluation.  The  rating  scales  were  referred  to  as  the  flight 
performance  evaluation.  They  were  developed  by  a  separate  group  of  subject  matter 
experts  which  constituted  the  people  who  would  actually  have  to  use  them.  The 
scales  were  designed  to  be  used  in  real  time.  Like  the  PPI,  each  flight  was 
divided  into  segments,  and  there  was  a  separate  sheet  for  each  segment.  Where  a 
segment  type  was  repeated,  such  as  an  en  route  leg  or  a  turn,  there  was  a  separate 
sheet  for  each  replication.  The  goal  was  to  have  each  element  of  the  flight 
evaluated  when  it  was  accomplished.  In  all,  three  ratings  would  be  independently 
completed  on  each  flight,  one  in  the  cockpit  and  two  separately  on  the  video  tape. 
The  flight  performance  rating  scales  are  presented  in  the  appendix  D. 


Ti  11  r*  r  T  o  c* 


1/00  ^^9  1/^  ati^  w/v|/ V'L  Atuo  li  w  a  A 


to  the  participants  once  they  enter  the  laboratory, 
detail. 


This  will  be  described  in 


After  completion  of  training/screening,  all  participants  were  treated  exactly  alike 
in  terms  of  procedure.  When  the  individual  arrived  for  the  first  test  flight  in 
the  GAT,  he/ she  was  given  a  series  of  briefings.  The  first  was  conducted  by  the 
experimenter  and  was  titled  the  "Participant  Friefing"  (see  appendix  E).  This 
described  the  reasons  for  doing  the  research  tud  explained  the  individual's  rights 
to  informed  consent  and  privacy.  The  participant  was  told  that  he/ she  would 
receive  no  performance  feedback  after  the  first  test  flight  and  to  hold  any 
questions  until  the  second  flight  in  the  series  had  been  completed.  The  second 
briefing  was  also  done  by  thi  experimenter.  This  was  titled  the  "Workload  Scale 
Instructions"  (see  appendix  F).  The  purpose  of  this  briefing  was  to  explain  the 
operation  of  the  workload  response  box  and  the  verbal  anchors  on  the  workload 


13 


scale.  Also,  an  attempt  was  made  to  "motivate”  the  pilot  to  respond  every  minute 
during  each  flight.  The  pilot  was  already  seated  in  the  cockpit  during  this 
briefing.  VJhen  it  was  completed,  the  experimenter  left  the  cockpit,  and  the 
instructor  pilot  entered  and  seated  himself  in  the  jumo  seat.  He  then  read 
the  "Test  Flight  Briefing"  (appendix  G)  to  the  participants.  This  briefing 
reemphasized  the  performance  standards  that  were  desired.  Upon  its  completion,  the 
instructor  pilot  provided  the  participant  with  a  flight  plan  for  the  test  flight. 
This  consisted  of  a  low-to-moderate  difficulty  instrument  round-robin  flight 
beginning  and  terminating  at  the  Atlantic  City  Airport,  New  Jersey.  All  flight 
conditions  were  viewed  as  normal  regime  of  flight.  There  were  no  surprises  and  no 
imposed  emergencies.  All  flights  were  "free"  flown  without  automatic  pilot  or 
flight  director.  Neither  wind  nor  turbulence  were  injected  into  the  scenario.  A 
diagram  of  the  flight  geometry  Is  available  in  the  appendix  H. 


TABLE  1.  LIST  0.  VARIABLES  WITHIN  EACH  FLIGHT  SEGMENT 


Takeoff 

Descent 

Heading 

Airspeed 

Airspeed 

Manifold  Pressure 

Manifold  Pressure 

Engine  RPM 

Engine  RPM 

IVSI 

Pitch 

CDI  Deflection 

Bank 

OBS  Error 

Pitch 

Climb 

Bank 

Heading 

Airspeed 

Initial  Approach 

Manifold  Pressure 

Airspeed 

Engine  RPM 

Heading 

Pitch 

Manifold  Pressure 

Bank 

Engine  RPM 

Gear 

Flaps 

IV  SI 

Gear 

Pitch 

En  Route 

Bank 

Altitude 

Manifold  Pressure 

Final  Approach 

Fngine  RPM 

Heading 

CDI  Deflection 

Manifold  Pressure 

Heading 

Engine  RPM 

OBS  Error 

Flaps 

Pitch 

Gear 

Pitch 

Bank 

CDI  Error 

VDI  Error 

IVSI 

14 


Once  briefed  and  familiarized  with  the  flight  plan,  the  pilot  was  literally  on 

his/her  own.  Although  the  instructor-pilot  sat  in  the  jump  seat,  his  sole  function 

was  to  complete  the  ratings  in  the  Flight  Performance  Evaluation.  He  was  under 
instructions  not  to  respond  to  participant  questions  or  to  provide  feedback  at  the 
end  of  the  first  flight. 

The  pilot  was  told  to  call  for  ATC  clearance  and  proceed  as  normal  for  an  actual 
flight.  ATC  was  operated  by  a  pilot  who  worked  from  a  script  developed  by  an  air 
traffic  controller.  ATC  provided  all  clearaijces  and  background  traffic  which 
was  also  scripted  (see  Appendix  1)  on  a  timetable  geared  to  the  location  of  the 
simulated  aircraft  on  the  plotted  flight  geometry.  The  air  traffic  controller  had 
constant  view  of  the  Hewlett-Packard  plotter  which  preplotted  the  entire  flight 
geo:ce:  ~v  then  overplotted  the  actual  flight  track  as  performed  by  the  pilot 

partic^  nt .  An  example  of  this  flight  track  plot  is  presented  in  figure  1. 


FIGURE  1 ,  SAMPLE  FLIGHT  TRACK  PLOT 


15 


The  ATC  also  served  the  purpose  of  assisting  pilots  who  developed  navigation 
problems.  This  did  not  occur  with  the  masters  level  pilots  but  did  appear  as  a 
problem  with  several  journeymen,  ATC  provided  guidance  back  to  the  radial  in  the 
original  flight  plan.  It  was  felt  that  there  was  enough  measurement  capacity  in 
the  experiment  so  that  this  would  not  unduly  influence  the  results,  and,  in  fact, 
assisting  lost  pilots  would  have  helped  the  scores  of  the  journeymen  group.  This 
would  have  pushed  the  two  groups  closer  together  which  biases  against  the 
results  that  were  hypothesized.  This  is  generally  considered  a  legitimate  form  of 
experimenter  induced  bias  especially  when  the  participant  is  still  able  to  achieve 
hypothesized  effects. 

The  second  flight  was  completed  sometime  after  the  first,  based  on  participant 
availability  and  equipment  scheduling  considerations.  While  a  constant  interflight 
interval  was  desired,  it  turned  out  not  to  be  possible.  Intervals  rangea  from  as 
short  as  1/2  hour  to  as  long  as  1  week.  The  second  flight  was  conducted  exactly  as 
the  first  flight.  Each  briefing  with  the  exception  of  the  participant  briefing  was 
again  presented  verbatim.  The  flight  geometry  and  the  ATC  script  were  exactly  the 
same  . 

At  the  completion  of  each  flight,  the  participant  was  given  a  brief  "Flight 
Workload  Questionnaire"  (appendix  J).  This  was  completed  before  leaving  the 
cockpit  and  before  the  experimenter  administered  an  informal  interview.  At  the  end 
of  the  second  flight,  all  participant  questions  were  answered,  and  the  flight  track 
plots  were  available  for  examination. 

DATA  COLLECTION  PROCEDURES .  During  each  test  flight,  there  were  four  sources  of 
data;  the  Flight  Performance  Evaluation,  the  Flight  Workload  Questionnaire,  the 
Automated  Performance  Measurement,  and  video  tape  of  the  flight  instruments. 
The  first  two  sources  have  already  been  discussed.  The  Automated  Perfottaance 
Measurement  consisted  of  storing  all  GAT  variables  at  a  sampling  rate  of  once 
per  second.  This  was  accomplished  by  a  Xerox  XDS-530  computer  which  placed  the 
information  on  magnetic  tape  for  latter  reduction  in  another  computet  Data  for 
workload  response  and  delay  were  also  stored  on  the  same  tapes.  A  video  camera  was 
mounted  through  the  cockpit  window  over  the  pilot's  left  shoulder.  It  recorded  all 
the  primary  flight  instruments  during  each  test  flight.  These  video  tapes  were 
reviewed  independently  by  two  separate  instructor  pilots  who  completed  performance 
ratings  using  the  same  Flight  Performance  Evaluation  form  that  had  been  used  in  the 
cockpit.  These  racings  were  completed  in  the  blind  in  that  no  participant  pilot 
identifying  information  was  provided  with  the  video  tapes.  Tape  reviewers  were 
provided  with  the  flight  track  plots  with  Lhe  pilot  code  numbers  removed.  Tapes 
and  plots  were  assigned  random  three-digit  code  numbers  for  control  purposes.  Only 
the  experimenter  possessed  the  key  list  and  could  associate  the  three-digit  code 
with  masters  and  journeymen  participants. 


RESULTS 


QUAT  IFICATIOKS,  0BJECT1V..S,  AND  STRATEGY. 

This  wao  the  lixst  experiment  in  a  proposed  series  designed  to  develop  and 
evaluate  measurement  techniques  in  the  areas  of  pilot  performance  and  workload. 
Participantr  in  this  experiment  were  local  volunteers  and  as  such  may  or  may  not  be 
representative  of  the  population  of  general  aviators.  In  the  hope  that  there  was 


16 


some  correspondence  with  the  population,  inferential  statistics  have  been  employed 
as  well  as  descriptive  and  regression  techniques.  Where  Inferences  are  made,  the 
reader  should  draw  his  own  conclusions  about  the  representativeness  of  the  sample. 
The  goal  of  the  data  analyses  reported  heie  was  to  draw  as  much  out  of  the  results 
as  seemed  feasible  without  overworking  the  data. 

RESULTS  SUMMARY. 

The  Automated  Performance  Measure  (APM)  was  called  the  Pilot  Performance  Index 
(PPI).  Each  variable  (l.e.,  airspeed)  was  initially  analyzed  within  each  flight 
segment  to  determine  if  it  would  separate  the  two  pilot  groups.  The  results  of 
these  preliminary  analyses  led  to  a  reduction  in  the  number  of  variables  within 
each  flight  segment  and  the  elimination  of  the  takeoff  segment,  where  no  variables 
separated  the  two  pilot  groups.  An  analysis  of  variance  (AKOVA)  conducted  on  the 
PPI  scores  demonstrated  the  superiority  of  the  masters  pilots  in  all  segments  of 
flight.  The  same  analysis  showed  that  there  were  performance  differences  across 
the  flight  segments  (i.e.,  descent  was  the  poorest  and  final  approach  was  the 
best) .  These  performance  differences  occurred  for  master  and  journeyman  pilots 
alike.  Both  groups  also  tended  to  Improve  their  performance  slightly  from  the 
first  to  the  second  flights.  Regression  techniques  confirmed  the  performance 
separation  between  the  two  groups. 

The  performance  ratings  were  conducted  by  three  independent  raters.  Their  level  of 
agreement,  as  measured  by  Interrater  reliability  correlations,  was  very  high  for 
flight  segment  means.  Their  data  were  averaged  to  produce  one  set  of  ratings 
for  each  flight.  Analysis  on  each  segment  of  flight  indicated  that  the  ratings 
separated  masters  from  journeymen  on  all  but  the  takeoff  segment,  which  was 
deleted.  The  turn  segment  was  also  deleted  because  of  a  strong  tendency  for 

pilots  to  improve  between  flights.  A  three-way  ANOVA  indicated  that  there  was 
clear  separation  between  the  pilot  groups.  There  was  also  a  strong  segments  effect 
and  a  weak  improvement  between  flights  for  both  groups.  There  was  an  interaction 
between  the  pilots  and  segments  variables.  This  meant  that,  unlike  the  PPI 
results,  the  performance  ratings  identified  a  different  pattern  of  performance 
across  flight  segments  for  the  two  groups  of  participants.  The  two  segments  where 
performance  was  best,  climb  and  descent,  were  in  reverse  order  for  the  two  groups. 
Regression  techniques  confirmed  these  results. 

The  ANOVA  of  the  inflight  workload  data  Indicated  that  journeymen  felt  t'uey  were 
working  much  harder  than  the  masters  pilots.  Both  groups  Indicated  a  lowered 

workload  the  second  time  they  flew  the  same  flight  plan.  There  was  significant 
variability  across  flight  segments  for  both  groups.  The  lowest  workload  segment 
was  en  route,  and  the  highest  was  final  approach.  A  postflight  questionnaire  also 
demonstrated  the  higher  perceived  workload  for  the  less  experienced  pilots. 

Comparisons  were  made  between  key  variables.  The  two  measures  of  workload, 
inflight  and  postfllght,  were  strongly  correlated.  The  APM,  using  the  PPI 
correlated  r  “  .82,  with  the  performance  ratings  for  total  flight  scores  when  the 
entire  sample  was  considered.  There  was  a  moderate  and  negative  correlation 

r  »  -.567  between  the  PPI  and  the  inflight  workload  measure.  The  postflight 

workload  measure  had  approximately  the  same  relationship  with  the  PPI,  r  *  -.570. 
The  postflight  workload  measure  correlated  r  *  -.710  with  the  performance  rating 
data.  Pilots  who  performed  at  the  Ir’-v  ■  end  of  the  continuum  felt  that  they  had  to 
work  harder  to  do  it. 


17 


As  described  In  an  earlier  section,  there  were  two  types  of  performance  measurement 
employed  in  this  study.  The  first  was  APM  which  used  the  computer  to  collect 
(aircraft  state)  data  on  a  second-by-second  basis.  The  second  method  involved 
performance  ratings  by  three  independent  observers.  Each  of  these  data  sets  will 
be  described  separately. 

AUTOMATED  PERFORMANCE  MEASUREMENT.  The  reader  will  recall  that  the  flight 
simulation  system,  which  was  used  in  this  experiment,  could  record  and  store 
approximately  87  variables.  This  list  was  r<^”''ewed  analytically  by  subject  matter 
experts,  and  subsets  of  the  total  variables  available  were  assigned  to  each  segment 
of  flight.  A  list  of  these  selected  variables  was  presented  earlier  in  the  method 
section  (table  1). 

The  primary  purpose  of  the  initial  analyses  on  this  data,  which  would  become  the 
PPI,  was  to  further  screen  the  variables.  It  was  important  to  eliminate  those 
variables  which  would  not  contribute  to  the  separation  of  the  two  pilot  groups, 
the  masters  and  the  journeymen.  Cnee  the  data  were  collected  from  the  24  pilot 
participants,  further  variable  screening  was  done  empirically  using  the  data  itself 
as  a  guide . 

The  statistical  technique,  ANOVA,  was  used  for  this  purpose.  In  simple  terms, 
ANOVA  is  a  method  of  dividing  up  or  partitioning  variance  in  an  experiment  based 
on  specific  sources  of  variance.  Given  the  experimental  design,  there  were 
three  important  possible  sources  of  variation.  These  Included  the  performance 
variability  between  pilot  groups,  variability  between  the  two  flights  each  pilot 
"flew",  and  the  interaction  between  these  two  variables.  ANOVA  compares  each 
source  of  variation  to  an  error  term,  which  takes  into  account  uncontrollable 
variablUty,  such  as  the  differences  between  individual  pilots.  If  a  large  enough 
ratio  called  an  "F"  results,  then  the  result  is  significant  and  Is  not  likely  to 
have  occurred  from  chance  alone. 

Each  variable  in  the  original  PPI  list  was  subjected  to  a  two-way,  pllots-by- 
f lights  ANOVA.  The  results  are  reported  In  table  2,  titled  "Flight  Variable 
Screening  Using  Analysis  of  Variance."  Also  reported  is  the  correlation  ratio 
which  is  the  proportion  of  variability  in  an  analysis  which  can  be  accounted  for  by 
a  specific  source.  According  to  Linton  and  Gallo  (1975),  correlation  ratios  above 
10  percent  are  equal  or  superior  to  a  great  deal  of  so-called  significant  effects 
reported  in  the  literature. 

Decisions  in  terms  of  variable  deletion  or  retention  are  listed  on  the  right-hand 
side  of  the  table.  These  decisions  were  based  on  several  criteria.  If  the  pilots 
effect  (the  difference  between  masters  and  Journeymen)  was  significant,  then  the 
variable  was  retained  unless  there  was  also  a  significant  flights  effect.  If 
either  the  flights  effect  or  the  interaction  between  flights  and  pilots  (not 
shown  in  table)  was  significant,  then  the  variable  was  deleted.  A  variable  with  no 
significant  pilots  effect  could  still  be  retained  if  its  correclatlon  ratio  was 
three  percent  (an  arbitrary  choice)  or  greater.  One  final  criterion  for  retention 
concerned  the  paired  variables  of  RPM  and  manifold  where  there  was  a  reading  for 
left  and  right  engines.  If  either  variable  was  deleted,  then  they  were  both 
deleted.  It  seemed  Illogical,  for  example,  for  RPM  or  manifold  pressure  on  the 
right  engine  to  separate  the  pilot  groups  while  the  comparable  numbers  for  the  left 
engine  failed  to  do  so.  Where  actual  discrepancies  did  occur,  they  were  attributed 
to  artifacts  in  the  flight  simulator.  The  final  list  of  variables  after  screening 
Is  shown  In  table  3. 


18 


TABLE  2 


FLIGHT  VARIABLE  SCREENING  USING  ANALYSIS  OF  VARIANCE 


Xafmnc 

Pilota  C.'fact 

PiUta  tfface 

Piifhta  Effact 

Yttiablt* 

CocraLaeion  Ratio 

Siaaificanca 

Siinif icanca 

Daeiaioa 

takaeff 

Raading 

Dalata 

> iravatd 

Oaiata 

Manifold  *  1 

Oauca 

Manifold  ~  X 

3.I2X 

Dalata 

WW  -  L 

Oalaca 

WH  -  R 

Dalata 

fitcb 

Ratam 

Bank 

Dalata 

C^iab 

Utadiflc 

lO.OBX 

P-3.93  (P-  06) 

RaeaiQ 

Airapaad 

U.2X 

P-I.U  (K.OI) 

Racaia 

Hibifold  *  L 

Dalata 

Manifold  •  R 

l.AZ 

Calat* 

RW  -  L 

DtLact 

ir<  -  R 

Dalata 

Pitch 

Dalata 

Bank 

1 .36Z 

Da  led 

Saar 

I..33Z 

P-4.01  (P-. 05371 

Ratals 

flap# 

Dalcta 

tvs  I 

4.31X 

Ratals 

tn  Rouia 

Aleicuda 

10. nx 

P-3. 43  (P-.069) 

Ratals 

MaoifoU  L 

Dalata 

Manifold  -  R 

Dalata 

RPM  -  1. 

Dalata 

RFH  •  R 

Dalata 

Ptceh 

36.111 

P-14.79  (K.OOl) 

Ratlin 

Baadini 

36.071 

P-13.97  (pc .001) 

Rf lain 

COi 

3?.l)X 

f-11.20  (X.OOl) 

Racaifi 

OU 

24.361 

f-l0.16  (K.OI) 

Ratain 

DaacaoC 

Haadiot 

4.09t 

Ratlin 

Airapaad 

7.26X 

lataiB 

NaQifeU  •  t 

Dalata 

ManifoU  •  1 

6.61 

P-3.21  (P-.OI) 

Dalbta 

MM  -  L 

Dalata 

RPM  •  R 

Dalbta 

Pitch 

f-5.72  (P<.03) 

Dalata 

lank 

6.211 

tatais 

cot 

13. l« 

P-3.52  {p<.03) 

Rataia 

Oil 

U  .111 

P-J.93  (P-.OB) 

lataiB 

tvil 

4.911 

Rptaia 

Initial  Abpfoacit 

Haadini 

24,731 

r-l9.07  (K.OOl) 

r-3.09  <P-.093) 

Racain 

Airotad 

2.071 

Dalata 

Masilold  *  1. 

8.351 

r-3.22  (P-.0B6) 

Rataio 

Manifold  '  R 

3.7n 

latalo 

MM  ^  L 

Dalata 

MM  '  R 

Dalata 

Pitch 

Dalati 

lank 

13. 941 

r-6.86  (P<.03) 

ladin 

Ctat 

Da  lata 

Plafa 

r-4.30  (K.03) 

Dalata 

Pinal  Approach 

HakAing 

1  4X 

p-4.45 

Ratain 

Airapaad 

Dalata 

Mainfeld  >  1, 

2.91 

Dalata 

Manifold  >  R 

Dalata 

RPN  -  L 

2.691 

Oaiata* 

RPM  >  R 

4. 861 

r-a.27  (Pt.OS) 

Dalata* 

Pitch 

Dalata 

lank 

r  .341 

P-J.94  (P-.lO) 

r-3.i3  (P-.09) 

Dalata** 

Crar 

7.3  71 

KataiB 

Plapt 

19.131 

P-7.50  <P'  .03) 

lataia 

CDI 

10.411 

r-3.2l  {P'.03) 

P-3.47  (P-.075) 

Dalata** 

VDI 

4.461 

Ratain 

*vn 

P-2.96  (P-.IOJ 

Oalacp 

rtott  IX  eorrti«ci<7n  fueiof  if*  dil«ct4. 

f  v;ch  t«il  ^robtbiliris*  f  iO  arc  aupfraaaad 

f  Tiluaa  wicn  lail  probabiWci^f  >  0$  «r«  wt  conaidarad  pigniticiDC. 
Sis«a  chii  w4«  «  •ertcnlBg  «Morc,  Chott  b«cw«*r.  .0)  «nd  .10  ar*  aho«Mi. 

*0«l«c«d  bwcaua*  of  intarictiona  oich  (lignct  var^abit. 

**P«l<c«g  to  iowar  (lignt  «ff«ce. 


i9 


TABLE  3. 


PILOT  PERFORMANCE  INDEX  VARIABLE  LIST 


Takeoff 


Pitch 

Climb 

Heading 

Airspeed 


En  Route 

Alt itude 

Pitch  Angle 

Heading 

CDl 

OBS 

Descent 

Heading 

Airspeed 

Bank  Angle 

CDI 

OBS 

IVSl 


PPI  data,  as  described  in  the  method  section  of  this  report,  represent  trichotomous 
information.  At  each  point  where  the  computer  samples  from  the  data  stream,  the 
sample  of  pilot  performance  in  terms  of  aircraft  state  was  compared  against  the 
"windows"  or  standards,  and  a  zero  (0),  one  (1),  or  two  (2)  was  assigned.  The 
reader  should  keep  this  in  mind  when  examining  PPI  data  because  the  range 
must  always  be  between  scro  and  two.  with  the  latter  value  representing  best 
performance , 

The  next  step  in  the  PPI  data  analysis  was  to  produce  unweighted  segment  scores  for 
each  pilot  on  each  flight.  Tliis  was  done  by  the  simple  linear  addition  of  all  PPI 
data  within  a  segment  of  flight  for  that  particular  pilot.  This  sum  was  divided  by 
the  number  of  variables  entering  the  segraen:  multiplied  by  the  number  of  sample 
points  within  that  segment  for  that  flight.  The  result  was  a  segment  score  for 
pilot  03  (for  example)  on  the  first  flight,  and  this  score  ranged  from  zero  to 
two . 


Initial  Approach 

Heading 
Manifold  Left 
Manifold  Right 
Bank  Angle 

Final  Approach 

Heading 
Gear  Position 
Flap  Position 
VDI 


20 


Once  segment  scores  were  computed,  a  pilots-by-f Lights  ANOVA  was  run  on  each 
segment  of  flight  independently.  This  was  done  first  with  all  the  original 
variables  before  screening  included  in  the  segment  scores.  The  ANOVA' s  were 
repeated  after  deletion  of  selected  variables  and  recomputation  of  the  segment 
scores.  Table  4  provides  the  F  and  correlation  ratios  for  the  pilots  and  flights 
effects  when  all  PPI  variables  were  used  in  the  segment  scores. 

Table  5  shows  the  results  of  the  second  set  of  ANOVA' s  after  deletion  of  a 
considerable  number  of  variables.  Comparison  across  these  two  tables  is 
informative.  It  shows  gains  in  F  and  correlation  ratios  for  all  segments  with 
the  possible  exception  of  takeoff.  In  addition,  the  climb  and  initial  approach 
segments  lost  their  significant  flights  effects,  which  was  a  desirable  change. 
The  flights  effect  in  this  context  was  an  indicator  of  lack  of  measurement 
(test-retest )  reliability.  The  difference  between  the  two  tables  was  attributable 
to  the  removal  of  variables  that  contributed  more  to  error  than  they  did  to  the 
discrimination  between  the  two  pilot  groups.  Since  none  of  the  entry  variables  in 
the  takeoff  segment  appeared  to  he  workable,  this  segment  was  dropped  from  further 
analys is . 

A  pilots-by-flighta-by-segments  three-way  ANOVA  was  computed  to  determine  whether 
these  three  variables  interacted  in  any  way.  An  interaction  could  have  meant  that 
performance  variability  across  the  entirety  of  a  flight  was  dependent  on  pilot 
experience.  Table  6  provides  the  mean  PPI  scores  for  each  pilot  group  across 
the  five  segments  of  flight,  and  table  7  provides  a  detailed  summary  of  the  ANOVA. 

An  examination  of  the  mean  PPI  scores  shows  what  appears  to  be  a  consistent 
difference  for  every  segment  of  flight  between  the  two  groups  of  pilots.  This  would 
be  viewed  as  a  replay  of  the  analyses  already  reported.  There  are  also  apparent 
differences  between  segments.  The  small  magnitude  of  the  numbers  in  the  PPI  score 
data  might  lead  one  to  falsely  conclude  that  these  differences  are  small  also. 
What  is  important,  however,  is  not  the  size  of  the  numbers  but  how  far  group 
means  differ  in  relationship  to  within  group  variability  or  error.  The  ANOVA 
summary  shows  both  pilots  and  segments  effects  which  are  significant  and  account 
for  greater  chan  10  percent  of  the  variability.  The  flights  effect,  although 
significant,  only  accounted  for  1.39  percent  of  the  variability.  There  was  no 
interaction  between  pilots  and  segments.  At  the  risk  of  accepting  the  null 
hypothesis  (viewing  the  lack  of  a  significant  effect  as  a  positive  finding),  it 
appears  that  performance  differences  across  segments  of  flight  are  not  dependent  on 
pilot  experience.  The  ordinal  relationship  of  performance  to  segments  is  the  same 
for  both  groups  (see  table  6).  Performance  was  best  in  the  final  approach  segment 
and  worst  in  the  descent. 

The  significant  F  ratio  on  the  segments  effect  demonstrated  that  effect  variability 
exceeded  what  would  be  expected  by  chance  as  estimated  by  the  error  term  (segments 
by  S' 8  within  groups).  The  F  ratio  does  not  explain  where  the  actual  differences 
exist.  This  is  evaluated  by  another  technique  called  a  Newman-Keuls  analysis.  The 
first  step  in  a  Newman-Keuls  analysis  is  to  order  the  means  of  the  segments  (or 
levels  of  whatever  variable  you  are  evaluating).  Since  there  was  no  interaction 
between  pilots  and  segments,  the  means  to  oe  ordered  are  those  for  the  segments 
effects  for  masters  and  journeymen  data  pooled, 


21 


TABLE  4. 


ANALYSIS  OF  VARIANCE  ON  PPI  SEGMENT  SCORES  —  ALL  PPI  VARIABLES  INCLUDED 


Pilots  FI ights 


Segment 

Number  of 
Variables 

F 

Rat  io 

Correlation 
Rat  io 

F 

Rat  io 

Correlation 
Rat  io 

Takeoff 

8 

0.02 

0.04% 

1.35 

2.32% 

Climb 

11 

2.54 

7.07% 

3.97* 

4.76% 

En  Route 

9 

13.24** 

29.61% 

1.65 

1.48% 

Descent 

11 

2.60 

8.03% 

3.60 

3.34% 

Initial  Approach 

10 

2.84 

6.79% 

4.54* 

6.94% 

Final  Approach 

13 

4.58* 

11.22% 

3.22 

4.32% 

*  P<.05 
**  P<.01 


TABLE  5.  ANALYSIS  OF  VARIANCE  ON  PPI  SEGMENT  SCORES 
AFTER  DELETION  OF  SELECTED  VARIABLES 


Pilots 

FI ights 

Segment 

Number  of 
Variables 

F 

Rat  io 

Correlation 

Ratio 

F 

Patio 

Correlat ion 
Rat  io 

Takeoff 

1 

X 

A  A'l 

V  •  7  ^ 

o 

^  •  V  ^fW 

0.39 

0.59% 

Climb 

4 

6.73* 

16.80% 

0.90 

1.09% 

En  Route 

5 

25.84** 

47.18% 

0.95 

.52% 

Descent 

6 

7.15* 

19.51% 

2.83 

2.19% 

Initial  Approach 

4 

9.79** 

22.18% 

3.62 

3.95% 

Final  Approach 

4 

9.34** 

20  .49% 

4.10 

4.30% 

*  P<.05 

**  P<.01 


22 


TABLE  6. 


KEAN  AUTOMATED  PERFORMANCE  SCORES  USING  PPl 


Flight 

Pilot  Group 

Pilot  Group 

Segment 

1 

2 

Mean 

Climb 

1  .47 

1.50 

En  Route 

1.73 

1.75 

Masters 

Descent 

1  .42 

]  .44 

1.63 

I  Approach 

1.54 

1 .65 

F  Approach 

1  .90 

1 .91 

Flight  Mean 

1 .61 

1 .65 

Cl  imb 

1.20 

1  .30 

En  Route 

1.42 

1  .47 

Journeymen 

Desc  ent 

1  .06 

1  .23 

1.38 

I  Approach 

1  .27 

1.38 

F  Approach 

1 .65 

1  .81 

Flight  Mean 

1.32 

1 .44 

TABLE  7,  AUTOMATED  PERFORMANCE  SCORES.  PPl  ANALYSIS  OF  VARIANCE 

(Pilots  by  Flights  by  Segments) 

Source  of  Correlation  F 


Variability 

DF* 

MS 

Rat  io 

Rat  io 

Pilots  (P) 

1 

3.81 

15.12% 

29  .98** 

Error 

22 

0.127 

Flights  (f) 

i 

0.350 

1  o 

1 

ft  c  a 

F  X  P  Interact  ion 

1 

0.097 

0.38% 

2.64 

Error 

22 

0.037 

Segments  (S) 

4 

2.085 

33.05% 

30.18** 

S  X  P  Interaction 

4 

0,025 

0.39% 

0.36 

Error 

88 

0.069 

F  X  S  Interact  ion 

4 

0.012 

0.19% 

0.39 

F  X  S  X  P  Interact  ion 

4 

0.014 

0.22% 

0.44 

Error 

88 

0.031 

*  Degrees  of  Freedom 

**  P<.01 


23 


Table  8  provides  the  ordered  means  and  the  differences  between  each  pair  of  means. 
These  differences  are  then  compared  against  the  significance  criteria  listed  below, 
and  those  which  exceed  the  criteria  are  considered  significantly  different.  It 
will  be  noticed  that  the  further  two  means  are  apart  in  ordered  steps,  the  more 
difficult  it  is  for  the  difference  between  them  to  reach  significance.  This  makes 
the  Newman-Keuls  method  more  conservative  than  other  techniques  which  employ  the 
same  critical  value  or  significance  criteria  for  all  comparisons  between  means. 
Lines  below  segments  in  the  analysis  summary  indicate  there  is  no  significant 
difference  between  those  segments. 


TABLE  8.  NEWMAN-KEULS  ANALYSIS  OF  PPI  SEGMENTS  EFFECTS 

PPI  Segment  Means 


Initial 

Final 

Segment 

Mean  PPI 

Descent 

Climb 

Approach 

En  Route 

Approach 

Scores ; 

1  .28431 

1.36606 

1 .46019 

1.59050 

1.81561 

Descent 

1.28431 

0.08175 

0.17588** 

0.30619** 

0.5313** 

Climb 

1.36606 

0.9413** 

0.22444** 

0.44955** 

I  Approach 

1.4O019 

0.13031 

0.35542** 

En  Route  1.59050  0.22511** 

F  Approach  1.81561 

**  P<.01 


Significance 

Criteria 


Ordered  Steps 

2  3  4  5 

0.1414  0.1607  0.1724  0.1806 


Segment : 


Analysis  Summary 

Initial  Final 

Descent  Climb  Approach  En  Route  Approach 


24 


The  PPI  data  were  also  evaluated  using  regression  analysis.  This  method,  like 
ANOVA,  partitions  variability  or  variance.  Regression  examines  the  relationship 
of  a  number  of  independent  variables  to  one  or  more  dependent  variables.  It 
determi.ies  the  optimal  linear  combination  of  variables  and  provides  a  prediction 
equation  so  that  an  individual's  performance  on  one  set  of  scores  could  be 
predicted  from  another  set.  For  the  purposes  of  this  experiment,  it  was  desirable 
to  see  if  group  membership  could  be  predicted  from  segment  score  performance. 
Entering  this  analysis  were  five  segment  scores  for  each  pilot,  which  was  the 
dependent  variable.  Group  membership  was  coded  as  1  for  masters  and  2  for 
journeymen.  Three  multilinear  regressions  were  computed  on  the  PPI  data,  one  for 
each  flight  independently  and  one  for  the  data  with  flights  pooled.  The  results 
are  described  in  table  9. 


TABLE  9.  MULTILINEAR  REGRESSION  ON  PPI  SCORES 


Mu  1 1  Lp 1 e 
r 

Multiple  Regression 

r2  F  Ratio 

Relative  Frequency 
of  Correct 

Class  if icat ion 

Flight  1 

0.814 

0.662 

7 .062** 

22/24 

Flight  2 

0.719 

0.517 

3.848* 

22/24 

Flights 

Pooled 

0.811 

0.657 

6.906** 

23/24 

*  P<.05 
**  P<.01 

Regression 

Intercept  and  Weights 

Y  Intercept 

Climb 

En  Route  Descent 

1  Approach  F 

Approach 

Flight  1 

4.620 

-0.052 

-1.133  -0.557 

-0.325 

-0.068 

Flight  2 

5.199 

-0.020 

-1.054  -0.217 

-0.436 

-0.554 

Flights 

Pooled 

4.868 

0.106 

-1.410  -5.61 

-4.68 

0.074 

25 


In  contrast  to  a  stepwise  regression,  which  will  be  discussed  shortly,  multilinear 
regression  uses  all  the  independent  variables  and  combines  them,  taking  into 
account  the  contribution  of  each  to  prediction  and  the  degree  to  which  they  covary 
with  each  other.  Table  9  includes  quite  a  bit  of  information.  The  multiple  r  is 
the  multiple  correlation  between  the  independent  and  the  dependant  (pilot  group 
membership)  varjables.  It  indicates  the  degree  of  the  relationship  which  is 
stronger  the  closer  it  approaches  1.  The  multiple  r  squared  has  been  called  the 
coefficient  of  determination  and  is  similar  to  the  correlation  ratio  used  earlier 
to  help  interpret  the  results  of  ANOVA.  It  estimates  the  proportion  of  variability 
in  the  dependent  variable  which  can  explained  by  the  variability  in  the  independent 
variables  —  the  higher  the  multiple  r  squared,  the  better  the  regression.  The 
F  on  the  regression  determines  whether  the  variability  explained  by  the  regression 
is  beyond  chance.  As  indicated  by  the  asterisks,  the  F  ratios  were  significant  for 
all  three  regressions. 

A  linear  regression  equation  includes  an  intercept  for  the  axis  and  a  value 
for  each  independent  variable  known  as  a  beta  weight.  These  are  reported  in  the 
table.  There  are  essentially  three  regression  equations  in  table  9.  It  was 
gratifying  to  note  that  the  intercepts  and  beta  weights  for  the  two  flights  were 
relatively  similar.  Using  any  of  the  three  regression  equations,  the  segment 
scores  from  each  pilot  can  be  used  to  predict  group  membership.  These  predicted 
values  must  be  in  the  range  from  1  to  2.  Ideally,  all  journeymen  would  receive  a 
prediction  of  2,  and  all  masters  would  receive  a  1.  Incidentally,  the  reason  that 
most  of  the  beta  weights  were  negative  was  because  of  the  arbitrary  coding  of 
masters  as  1  and  journeymen  as  2. 

Once  a  cutoff  point  is  selected,  it  is  a  simple  matter  to  count  the  number  ot 
correct  predictions  which  is  listed  in  the  table  as  the  relative  frequency  of 
correct  classification.  Using  the  multilinear  regression  equation  with  the  two 
flights  pooled,  23  out  of  24  participants  could  be  correctly  classified.  One 
journeyman  was  misclassi f ied  as  a  masters  level  pilot.  This  particular  individual 
apparently  performed  better  than  his  journeymen  peers. 

While  the  multilinear  regression  technique  uses  all  the  segment  scores  to  develop  a 
prediction  equation,  stepwise  regression  uses  only  those  variables  which  enhance 
prediction  and  ignores  the  rest.  It  begins  with  the  variable  that  relates  best 
with  the  criterion  (master-journeyman)  and  in  stepwise  fashion  add^  variables  until 
they  no  longer  provide  a  significant  contribution.  The  results  of  a  ctapwise 
regression  (table  10)  indicate  that  comparable  accuracy  can  be  achieved  with  only 
the  en  route  and  descent  segments  of  flight.  These  two  segments  do  about  as  well 
as  the  whole  flight  in  separating  the  two  pilot  groups. 

This  becomes  especially  clear  when  examining  a  histogram  of  the  canonical  variable 
(figure  2)  for  pilot  performance  developed  from  using  only  these  two  segments  of 
flight.  One  need  not  dwell  on  the  actual  values  of  the  canonical  variable.  It  is 
simply  a  standardized  conversion  of  the  predicted  pilot  performance  scores.  What 
is  important  is  that  there  is  only  one  overlap  between  the  two  groups,  which  is  an 
enviable  finding  in  any  prediction  system. 


26 


A  word  of  caution  must  be  stated  concerning  the  results  of  these  regression 
analyses.  Gondek  (1981),  in  an  article  in  Educational  and  Psychological 
Measurement,  noted  that  statistical  package  software  (we  employed  BMDP)  tends  to 
overestimate  the  quality  of  predictions.  This  is  further  confounded  predicting 
group  membership  using  the  same  data  that  were  employed  to  develop  the  regression 
equations.  Ideally,  a  new  set  of  data  should  be  used  to  establish  the  validity  of 
the  regression  equations.  However,  even  assuming  that  we  may  be  overpredicting, 
the  relationships  are  so  strong  that  it  is  anticipated  they  would  hold,  given  a 
replication  of  the  experiment.  The  prediction  accuracy  might  decrease  slightly. 


TABLE  10.  STEPWISE  REGRESSION  ON  PPI  SCORES  (FLIGHTS  POOLED) 


Mult iple 
r 


Mult  iple 
r2 


Ad justea 
Mult iple 
r2 


Regression 
F  Ratio 


Relative  Frequency 
of  Correct 
Classification 


0.792  0.627  0.591 


17.63** 


23/24 


**  P<.01 


Regression  Intercept  and  Weights 


Y  Intercept 
4.778 


En  Route 


-1.623 


Descent 


-0.562 


MASTERS  JOURNEYMEN  DATA 
HISTOGRAM  OF  CANONICAL  VARIABLE 


j 

J  J  J  J  ^  J  J 

..44 

-i.i  -i.>  -.to 

-1*^ 


N  M 

J  jJ  MMJHMn 


-.30  .iU 

-•6t  C.U  .OC  l.i 


H 

H  ft  H  n 


loi  2»l  i,7  i,3 

l.e  3.0 


FIGURE  2.  HISTOGRAM  OF  THE  PILOT  PERFORMANCE  INDEX  CANONICAL  VARIABLE 


27 


PERFORMANCE  RATINGS ■  Independent  performance  ratings  by  three  observers  were 
completed  on  each  flight.  The  rating  form  is  presented  in  appendix  D.  One  rating 
was  completed  during  the  flight  simulation  by  the  instructor  pilot,  who  was 
familiar  with  the  participants.  The  second  and  third  ratings  were  accomplished  by 
experienced  pilots,  who  examined  video  tapes  of  the  flights  and  the  flight  track 
plots.  Every  attempt  was  made  to  conceal  the  identity  and  group  membership  of  the 
participants.  However,  since  the  video  tape  contained  an  audio  track  of  air-ground 
communications,  raters  may  not  have  been  completely  "blind"  because  of  the 
possibility  of  voice  recognition. 

The  first  step  in  the  data  analysis  was  the  evaluation  of  interrater  reliability. 
Obviously,  if  the  raters  did  not  agree  with  one  another,  the  measurement  system  had 
little  potential.  Only  the  eight-point  rating  scales  in  the  evaluation  form  were 
used  for  this  and  all  subsequent  analyses.  All  dichotomous  (two-point,  yes-no)  and 
other  non-eight-point  scales  were  dropped.  They  had  been  included  primarily 
for  the  comfort  of  the  raters,  who  felt  a  need  for  them.  Visual  examination 
indicated  a  lack  of  reliability,  and  the  effort  required  to  rescale  them  did 
not  seem  valuable.  Also,  one  flight  was  lost  because  of  video  taping  problems 
(Participant  23,  Flight  1). 

Interrater  reliability  was  first  computed  using  correlation  on  all  eight-point 
scales  within  each  flight  for  each  pair  of  raters.  These  correlations  for  each 
flight  are  presented  in  appendices  K  and  L.  These  results  are  summarized  in 
table  11  which  presents  reliability  correlations  tween  pairs  of  raters  when  all  the 
data  across  flights  are  used.  There  was  a  great  deal  of  consistency  across  rater 
pairs.  There  was  also  an  obvious  difference  between  the  reliabilities  when  raters 
observed  masters  and  journeymen  pilots  respectively,  with  more  variability  between 
raters  when  evaluating  journeymen  performance.  This  was  not  surprising  since  the 
journcj'men  demonstrated  more  inter-  and  intra-participant  variability  in  their 
performance . 

After  computing  unweighted  summated  ratings  for  each  rater  on  each  segment  of 
flight,  reliability  correlations  were  repeated.  The  summated  ratings  were  actually 
an  average  of  the  ratings  within  each  flight  segment.  For  example,  the  enroute 
segment  had  four  rating  scales:  course  alignment,  altitude,  pitch  and  bank,  and 
positive  control.  These  were  summed,  and  the  total  for  each  rater  was  divided  by 
four.  These  summated  scales  were  then  correlated  between  raters.  The  results  were 
very  encouraging  (table  12).  Using  summated  scales,  interrater  reliability  was 
acceptable  by  any  standard  of  test  and  measurement.  The  reader  is  reminded  that 
the  closer  the  correlation  is  to  one,  the  stronger  the  relationship.  Based  on 
these  results,  it  was  decided  to  average  the  summated  ratings  across  the  three 
raters  and  use  those  data  points  in  subsequent  analyses.  What  this  produced  was  a 
performance  rating  number  for  each  pilot  on  each  segment  of  flight. 

TABLE  11.  INTERRATER  RELIABILITY  CORRELATIONS 


Rater  Pairing 

Pilot  Group  1 . 2  1 . 3  2 . 3 


Journeymen  0.77  0.76  0.76 

0,91  0,88  0.94 


28 


Masters 


TABLE  12.  INTERRATER  RELIABILITY  EMPLOYING  SEGMENT  MEANS 
FOR  EACH  RATER  AS  DATA  POINTS  FOR  CORRELATIONS 


Rater  Pairing 


pilot  Group 

1.2 

1.3 

2.3 

Masters 

0.993 

0.993 

0.997 

Journeymen 

0.951 

0.961 

0.948 

All  Pilots 

0.976 

0.981 

0.977 

The  data  for  each  segment  of  flight  were  then  analyzed  using  a  two-way,  pilots-by- 
flights,  ANOVA.  The  results  indicated  a  strong  pilots  etrect  for  every  segment 
except  the  takeoff  (table  13).  This  meant  that,  as  with  the  automated  performance 
data,  performance  ratings  showed  rather  consistent  superiority  on  the  part  of  the 
experienced  masters  when  contrasted  with  the  journeymen.  Although  the  turn  segment 
showed  the  same  effect,  it  also  provided  a  significant  flights  effect.  Both  pilot 
groups  were  rated  higher  on  the  second  flight.  The  fact  that  there  was  no 
interaction  between  the  turn  flights  effect  and  pilot  group  indicates  that  the 
flights  effect  was  probably  one  of  route  familiarity  rather  than  a  true  performance 
improvement.  If  the  latter  had  been  the  case,  one  might  have  expected  a  larger 
change  in  performance  from  the  journeymen  than  from  the  masters  group.  Since  we 
were  trying  to  minimize  transitory  learniri  or  familiarity  effects  from  this 
measurement,  turns  were  deleted  from  further  analysis. 

A  descriptive  summary  of  the  performance  rating  data  is  provided  in  table  14. 
Visual  examination  indicates  a  possible  difference  between  the  two  pilot  groups  and 
some  variability  across  flight  segments.  There  appears  to  be  a  slight  improvement 
from  the  first  to  second  flights. 

These  appearances  are  confirmed  in  part  by  the  ANOVA  described  in  table  15.  Before 
discussing  this  analysis,  a  word  of  caution  should  be  sounded.  The  ANOVA's  were 
computed  or.  the  segment  scores  for  screening  purposes  only.  The  ANOVA  below  should 
be  thought  of  as  informative  rather  than  conclusive  because  of  the  nature  of 
the  data  and  the  theoretical  model  on  which  ANOVA  is  based.  Although  questionnaire 
and  rating  scale  type  measures  are  often  subjected  to  inferential  techniques  (such 
as  ANOVA)  in  applied  research,  the  data  entering  the  analyses  may  or  may  not  meet 
the  assumptions  of  the  model  (i.e.,  interval  quality  measures).  We  continue  doing 
these  type  analyses  because  there  is  nothing  to  compare  with  the  descriptive  power 
of  an  ANOVA  partition  of  variance.  In  fairness  to  the  use  of  ANOVA  in  this 
particular  case,  the  results  will  be  confirmed  to  a  large  extent  by  regression 
techniques  to  be  reported  later.  Regression  models  are  less  restrictive  but  also 
less  powerful  than  ANOVA. 


29 


TABLE  13.  ANALYSIS  OF  VARIANCE  ON  FLIGHT  SEGMENT  PERFORMANCE  RATINGS 


Pilots  Flights 


Number  of 

F 

Correlation 

F 

Correlation 

Segment 

Variables 

Ratio 

Ratio 

Ratio 

Ratio 

Takeoff 

1 

0.10 

0.36% 

1.61 

1.94% 

Climb 

4 

14.63** 

30.62% 

1.11 

1.45% 

En  Route 

4 

37.97** 

51.40% 

1.33 

1.33% 

Descent 

3 

39.85** 

46.60% 

1.95 

2.45% 

Initial  Approach 

4 

41  .61** 

52.02% 

1.61 

1.71% 

Final  Approach 

4 

22  .23** 

36.55% 

3.89 

4.63% 

Turns 

4 

41.74** 

53.45% 

10.34** 

6.53% 

**  P<.01 

Note;  Ratings  for  in-cockplt  and  postf light  tape  observers  averaged. 
Multiple  segments  for  turn  and  en  route  segments  averaged. 


TABLE  14.  MEAN  PERFORMANCE  RATINGS 


Pilot  Group  Segment 

Climb 
En  Rout^ 

Masters  Descent 

I  Approach 
F  Approach 

Flight  Mean 


Climb 
En  Route 

Journeymen  Descent 

1  Approach 
F  Approach 

Flight  Mean 


Flight 

Pilot  Group 


1 

2 

Mean 

7.43 

7.64 

7.03 

7.24 

7.70 

7.74 

7.18 

6.73 

7.08 

6.48 

6.73 

7.07 

7.29 

6.50 

6.70 

5.40 

5.70 

5.93 

6.58 

5.52 

4.59 

5.01 

3.71 

5.05 

5.23 

5.81 

30 


TABLE  15.  PERFORMANCE  RATING  ANALYSIS  OF  VARIANCE  SUMMARY 


(Pilots 

by  Flights 

by  Segments) 

Source  of 

Corre lat ion 

F 

Variability 

DF 

MS 

Rat  io 

Rat  io 

Pilots  (P) 

1 

152.49 

32.97% 

63.08** 

Error 

20 

2.42 

Flights  (F) 

1 

8.56 

1.85% 

6.95* 

F  X  P  Interaction 

1 

1.80 

0.39% 

1.46 

Error 

20 

1  .23 

Segments  (S) 

4 

20.89 

18  .07% 

26.36** 

S  X  P  Interact  ion 

4 

2.99 

2.58% 

3.77** 

Error 

80 

0.79 

F  X  S  Interaction 

4 

0.59 

0.51% 

0.74 

F  X  S  X  P  Interaction 

4 

0.59 

0.51% 

0.75 

Error 

80 

0.79 

**  P<.01 
*  P<.05 


With  this  qualification,  it  would  appear  that  the  inferences  made  descriptively 
are  confirmed.  Masters  did  perform  significantly  better  than  journeymen.  This 
lends  concurrent  support  to  the  results  of  the  APM.  There  was  also  significant 
variability  across  segments  which  interacted  with  the  pilots  variable.  This  meant 
that  performance  differences  across  segments  varied  between  the  two  pilot  groups. 
A  flights  effect,  which  did  not  interact  with  pilot  group,  was  very  slight  but 
significant.  The  small  correlation  ratio  for  the  flights  effect,  1.85  percent, 
means  that  although  it  existed,  it  was  so  weak  that  from  a  practical  viewpoint  it 
could  be  discounted.  In  fact,  if  operating  in  the  terms  of  a  statistical  purist, 
it  would  be  viewed  as  nonexistent  because  it  did  not  reach  the  P<.01  level  of 
significance. 

The  interaction  between  pilot  group  and  flight  segments  meant  that  comparisons 
between  specific  flight  segments  (post-hoc  tests)  had  to  be  completed  on  masters 
and  journeymen  groups  separately.  The  results  of  the  Newman-Keuls  analyses  are 
presented  for  both  groups  in  table  16.  The  mean  performance  ratings  for  the  flight 
segments  of  each  group  are  ordered  in  terms  of  magnitude.  Reviewing  briefly, 
the  differences  between  these  means  are  computed  and  are  compared  against  the 
significance  criteria.  The  significance  level  of  P<.01  was  employed  throughout 
this  table.  The  lines  above  the  segments  indicate  there  is  no  significant 
difference  between  those  segments.  Flight  segments  which  do  not  share  common  lines 
are  significantly  different.  The  journeymen  performance  varied  considerably 

more  across  segments  of  flight  than  did  that  of  the  masters  pilots.  This  was  a 
confirmation  of  what  might  he  viewed  as  "common  sense"  knowledge  —  the  more 
experience,  the  greater  consistency  of  performance. 


31 


TABLE  16, 


PERFORMANCE  RATINGS  NEWMAN-KEULS  ANALYSIS  FOR  FLIGHF  SEGMENTS  EFFECTS 


HAt'ers  PiLoci 
F InmL  Init IaI 


ScKsent 

ApproAch 

Approach 

En  Route 

Climb 

Deacent 

Me  An 

RAting: 

6.606 

6.9U 

7.139 

7.539 

7.7U 

F  ApproAch 

6.606 

o 

o 

0.533 

0.933** 

1.108** 

I  ApproAch 

6.911 

0.22B 

0.628*^ 

0.803*« 

En  Route 

7.139 

0.400 

0.575** 

Climb 

7.539 

0.175 

netcenc 

7.7U 

**  P<.01 

Ordered  Steps 

2 

3 

4 

5 

Signif  iCAnCA 

0.499 

0.567 

0.608 

0.639 

CticcrLA 

- - 

AiiAlyeU 

SuiBiAry 

— 

- - 

— 

ScgmtnC : 

rinel 

IniciAl 

Approach 

ApproAch 

In  Route 

Climb 

Dtaeent 

JournATMn  PilotA 


7  Inal 

loit  ial 

Setmetic 

Mean 

Approach 

Approach 

En  Route 

De.  tamt 

Cl  imb 

Rating: 

4.385 

4.799 

5.350 

6.258 

6.600 

f  Approech 

4.365 

0.414 

1.165** 

1 .873** 

2.215** 

1  Approach 

4.799 

0.751** 

1.45,** 

l.BOl** 

En  Route 

5.350 

0.708** 

1.050** 

Deacoot 

6.25B 

Climb 

6,600 

•*  P<.01 

(;tdece<l  Scepa 

2 

3 

4 

5 

Signif icance 

0.499 

0.567 

0.608 

0.639 

Criteria 

Ana ly a  if  Summary 

Segment : 

Final 

Initial 

Approeeb 

Approach 

Co  Route 

Deiiccn*: 

Climb 

32 


Multilinear  legreasion  analyses  were  applied  to  the  performance  rating  data.  Pilot 
segment  performance  ratings  scores  for  climb,  en  route,  descent,  initial  approach, 
and  final  approach  were  regressed  on  the  dependent  variable  of  group  membership. 
The  dependent  variable  was  arbitrarily  coded  as  1  for  masters  and  2  for  journeymen. 
A  separate  analysis  was  completed  from  the  data  for  each  flight  and  for  the  flights 
pooled  by  averaging  (table  17).  Results  indicated  relatively  high  multiple 
correlations,  and  all  the  regressions  were  significant  from  zero  at  the  probability 
level  of  P<.01.  Classification  was  accomplished  using  the  same  criteria  (1.4)  as 
had  been  used  for  the  automated  data.  Using  the  regression  equation  to  classify 
group  membership,  all  participants  vith  a  predicated  score  of  1.4  or  higher  were 
classified  as  journeymen.  Classification  was  100  percent  accurate  for  the  first 
flight  but  dropped  to  91  percent  for  the  second.  When  all  the  data  were  pooled,  it 
returned  to  100  percent.  The  cautions  cited  by  Gondek  (1981)  apply  here  as  they 
did  when  discussing  the  automated  data.  The  accuracy  of  classification  may  be 
inflated  somewhat  by  the  packaged  software  but  is  still  impressive. 


TABLE  i7 .  MULTILINEAR  REGRESSION  DATA  ON  PERFORMANCE  RATINGS 


Mult iple 
r 

Mult iple 
r2 

F  Ratio 
on  the 
Regression 

Relative  Frequency 
of  Correct 

Classif ication 

Flight  1 

0.844 

0.713 

7.94** 

22/22 

Flight  2 

0.819 

0.671 

6.52** 

20/22 

Flights 

Pooled 

0.896 

0.802 

12.99** 

22/22 

**  P<.01 

Regression 

Intercept  and  Weights 

Y  Intercept 

Climb 

En  Route  Descent 

I  Approach  F 

Approach 

Flight  1 

3.967 

-0,40 

-0.122 

-0.121 

-0.079 

-0.033 

Flight  2 

4.643 

-0.122 

-0.026 

“0.133 

-0.087 

-0,105 

Flights 

Pooled 

4.247 

0.115 

-0.060 

-0.338 

-0.037 

-0.109 

33 


A  stepwise  regression  on  the  same  data  employed  In  the  last  multilinear  analysis 
on  the  pooled  flights  provided  very  similar  results  using  the  input  of  only 
two  of  the  five  flight  segments;  "Descent”  and  "Final  Approach"  (table  18).  The 
stepwise  regression  selects  Independent  variables  based  on  their  correlations  with 
the  dependent  variable  (master- journeyman)  and  attempts  to  choose  those  which 
contribute  most  to  the  accountable  variability  as  indicated  by  the  multiple  r 
squared.  The  selection  of  descent  and  final  approach  in  the  performance  rating 
data  should  not  be  considered  a  definitive  demonstration  of  their  relevance. 
Several  other  segments  were  very  close,  and  in  fact,  an  alternative  software 
package  might  have  just  as  likely  selected  "En  Route"  and  "Initial  Approach."  This 
is  a  function  of  the  fact  that  the  intercorrelations  between  segment  data  were  much 
higher  for  the  performance  ratings  than  they  were  for  the  automated  data. 

A  histogram  of  the  canonical  variables  produced  by  standardizing  the  predicted 
values  from  the  stepwise  regression  is  very  informative  (figure  3).  The  clear  cut 
separation  between  the  two  pilot  groups  is  evident,  and  there  were  no  overlaps  as 
there  had  been  for  the  PPI  data.  The  relative  frequency  of  correct  classification 
for  the  pooled  flight  data  was  100  percent  as  also  indicated  in  tables  17  and  18. 

PILOT  WORKLOAD.  Workload  in  this  eKperlment  was  measured  using  two  methods; 
inflight  and  postflight.  The  inflight  method  requested  a  response  every 

minute  from  the  pilot.  These  responses  were  made  on  a  10-point  scale  which  was 
described  in  an  earlier  section.  Higher  numbers  represented  higher  levels  of 
perceived  workload.  If  the  pilot  failed  to  respond  within  1  minute,  the  computer 
automatically  recorded  a  maximum  workload  response  and  maximum  delay  of  10  and 
60  seconds,  respectively.  This  event  was  the  exception  rather  than  the  rule. 

A  visual  inspection  of  the  data  indicated  that  the  very  short  duration  of  the  climb 
segment,  coupled  with  the  sampling  rate  of  once  per  minute  for  inflight  workload, 
made  the  data  suspect.  The  climb  segment  was  deleted  from  the  inflight  workload 
analysis.  This  left  four  regular  segments  of  flight  (en  route,  descent,  initial 
approach,  and  final  approach)  and  one  additional  segment  referred  to  as  "other." 
This  was  a  catch-all  segment  which  included  all  portions  of  the  flight  not 
otherwise  classified.  It  consisted  primarily  of  turn  Information.  Before 
analysis,  the  data  were  organized  pooling  all  like  segments.  This  applied  to  the 
en  route  segment  only,  which  contained  two  legs  or  elements  that  were  flown  on 
different  courses.  There  was  only  one  leg  for  each  of  the  other  segments.  The 
data  were  further  processed  by  averaging  all  the  sample  points  within  a  segment  for 
each  pilot  on  each  flight.  These  workload  "segment  scores"  became  the  data  points 
which  were  analyzed. 

An  examination  of  the  mean  perceived  workload  for  masters  and  journeymen  pilots 
appears  to  show  a  considerable  difference  between  the  two  groups  (table  19). 
Masters  pilots  reported  a  mean  workload  across  the  two  flights  of  otily  3.68  while 
journeymen  responded  with  a  mean  of  6.17. 


34 


TABLE  18  .  STEPWISE  REGRESSION  ON  PERFORMANCE  RATINGS  (FLIGHTS  POOLED) 


Mult iple 
r 


Mult iple 
^2 


Adjusted 
Mult iple 


F  Ratio 
on  the 
Regres  s ion 


Relative  Frequency 
of  Correct 
Class  if icat ion 


0.889  0.790 


0.767  35.65** 


22/22 


**  P<.01 


Regression  Intercept  and  Weights 


Y  Intercept 


Descent  F  Approach 


4.586  -0.337  -0.133 


STEPWISE  REGRESSION-7M 
HISTOGRAM  OF  CANONICAL  VARIABLE 


j  J 
.  . 

...  H  !> 


-  tiu  -<.•  JtJ 


'^.1  C 


-3S>0 

'.IOC  o.0<l 


M  3  H  M  V 

1.70  3.15 

./liJ  I.^C  2.10  2.^0 


FIGURE  3.  HISTOGRAM  OF  THE  PERFORMANCE  RATING  CANONICAL  VARIABLE 


35 


TABLE  19. 


MEAN  INFLIGHT  WORKLOAD  RESPONSES 


Segment 

Flight 

Pilot  Group 

Master  Journeyman 

Flight 

Segment 

Mean 

En  Route 

1 

2.63 

5.43 

4.03 

Descent 

1 

4.08 

6.15 

5.12 

Initial  Approach 

1 

4.27 

7.31 

5.79 

Final  Approach 

1 

4.43 

7.51 

5.97 

Other 

1 

4.21 

6.09 

5.15 

En  Route 

2 

2.55 

4.76 

3.66 

Descent 

2 

3.82 

5.59 

4.70 

Initial  Approach 

2 

3.99 

6.55 

5.26 

Final  Approach 

2 

3.86 

6.81 

5.33 

Other 

2 

2.94 

5.53 

4.23 

Pilot  Group  Mean 

3.68 

6.17 

4.92 

An  ANOVA  was  completed  on  this  data,  and  pilots  effect  (the  difference  between 
the  two  pilot  groups)  was  significant  (table  20).  Using  the  rule  of  thumb  of. 
10  percent  accountable  variability  as  a  guideline,  the  30  percent  seen  in  the 
correlation  ratio  for  the  pilots  effect  adds  to  its  creditability.  Journeymen 
pilots  reported  that  they  were  working  significantly  harder  across  all  segments  of 
flight.  This  was  indicated  by  the  lack  of  a  segments-by-pllots  Interaction.  The 
ANOVA  variance  indicated  two  other  effects  that  were  slgfnlf leant.  There  was  a 
alight  flights  effect  as  shown  by  a  decrease  in  reported  workload  from  the  first  to 
the  second  flights.  However,  this  effect  accounted  for  very  little  variability, 
1.60  percent.  There  were  also  significant  differences  across  segments  which  did 
not  interact  with  the  pilots  variable.  This  meant  that  these  differences  followed 
a  similar  pattern  for  both  pilot  groups. 


36 


TABLE  20.  INFLIGHT  WORKLOAD  ANALYSIS  OF  VARIANCE  SUMMARY 


(Pilots 

by  Flights 

by  Segments) 

Source  of 

Correlation 

F 

Variability 

DOF 

MS 

Ratio 

Ratio 

Pilots  (P) 

V 

343.42 

30.49% 

24.04** 

Error 

20 

14.28 

Flights  (F) 

1 

18.06 

1.60% 

6.06* 

F  X  P  Interaction 

1 

0.303 

0.10 

Error 

20 

2.98 

Segments  (S) 

4 

23.27 

8.26% 

9 .88** 

S  X  P  Interaction 

4 

2.13 

0.90 

Error 

80 

2.35 

F  X  S  Interaction 

4 

0.514 

0.33 

F  X  S  X  P  Interaction 

4 

0.727 

0.47 

Error 

80 

1.55 

*  P<.05 

**  P<.01 

As  indicated  earlier,  a  significant  effect  in  an  ANOVA  serves  only  as  a  pointer 
that  there  are  differences  between  levels  of  a  variable.  It  does  not  explain  where 
the  differences  are.  A  Newman-Keuls  analysis  was  completed  across  the  flight 
segments  (table  21).  Because  the  pattern  was  the  same  for  both  pilot  groups,  their 
data  were  analyzed  together.  The  differences  between  segment  means  were  compared 
against  the  significance  criteria  listed  at  the  bottom  of  the  table.  Pilots 
reported  that  they  were  wuikifig  significantly  harder  during  initial  and  final 
approaches  than  they  were  while  en  route.  This  finding  is  in  line  with  the 
"common  sense"  or  pragmatic  view  of  inflight  workload. 

In  addition  to  the  pilots'  workload  responses,  response  delay  was  also  recorded. 
This  was  the  time  in  seconds  from  the  moment  the  query  tone  was  sounded  until  the 
pilot  provided  a  response.  The  range  of  potential  delays  for  each  response  was 
from  0  to  60  seconds.  The  mean  response  delays  are  presented  in  table  22. 
Journeymen  appear  to  produce  longer  response  delays,  and  there  appears  to  be 
variability  across  segments.  Both  of  these  observations  are  misleading  as 
demonstrated  by  the  results  of  the  ANOVA  table  23.  The  only  effect  that  was 
significant  was  a  decrease  in  response  delay  across  the  two  flights.  Since  there 
was  no  f lights-by-pilots '  interaction,  this  result  applied  to  both  pilot  groups. 
These  results  indicate  that  response  delay  was  functionally  useless  for  the 
purposes  of  this  experiment. 


37 


TABLE  2],  NEWMAN-KEULS  ANALYSIS  ON  WORKLOAD  SEGMENTS  MAIN  EFFECT  (INFLIGHT) 


Segnent 

En  Route  Other 

Descent 

Initial 

Approach 

Final 

Approach 

Mean 

Rating: 

3.844  4.691 

4. SI 

5.527 

5.652 

En  Route 

3.864 

0.847 

i  .066 

1  .683** 

1 .808** 

Other 

4.691 

0.219 

0.836 

0.961 

Descent 

4.91 

0.617 

0.742 

I  Approach 

5.527 

0.125 

F  Approach 

5.652 

**  P<.01 

Ordered  Stepa 

2 

3 

4 

5 

significance 

1.219 

1.386 

1 .487 

1.557 

Criteria 


AiialytiR  Summary 


Init  iai 

Segment:  En  Route  Other  Oeacent  Approach 


TABLE  22.  MEAN  DELAY  (SECONDS)  DATA  SUMMARY 


Segment 

Flight 

Pilot 

Master 

Group 

Journeyman 

Flight 

Segment 

Mean 

En  Route 

1 

5.32 

14.52 

9.92 

Descent 

1 

12.64 

12.85 

12.75 

Initial  Approach 

1 

7.03 

17.76 

12,40 

Final  Approach 

1 

8.80 

13.30 

11,05 

Other 

1 

14.82 

22.33 

18.57 

En  Route 

2 

3.64 

7.03 

5.33 

Descent 

2 

10.21 

9.05 

9.63 

Initial  Approach 

2 

3.82 

6.47 

6.15 

Final  Approach 

2 

7.17 

6.01 

6.59 

Other 

2 

5.64 

10.53 

8.09 

Pilot  Group  Mean 

8.11 

11.99 

10,05 

Final 

Approach 


38 


TABLE  23.  INFLIGHT 

RESPONSE 

DELAY  ANALYSIS  OF  VARIANCE 

SUMMARY 

(Pilots 

by  Flights 

by  Segments) 

Source  of 

Correlat ion 

F 

Var lability 

DF 

MS 

Rat  io 

Ratio 

Pilots  (P) 

1 

826.44 

2.17% 

1.78 

Error 

20 

465.44 

Flights  (F) 

1 

1,837.11 

4.84% 

9.19** 

F  X  P  Interaction 

1 

358.36 

0.94% 

1.80 

Error 

20 

199.80 

Segments  (S) 

4 

220  47 

2.30% 

1.77 

S  X  P  Interaction 

4 

105.41 

1.1% 

0.83 

Error 

80 

124.38 

F  X  S  Interaction 

4 

89.76 

0.94% 

0.7  2 

F  X  S  X  P  Interaction 

4 

31  .45 

0 . 33% 

0.25 

Error 

80 

123.98 

**  P<.01 


An  additional  source  of  information  on  pilot  workload  was  a  four-item  questionnaire 
administered  at  the  completion  of  each  simulated  flight.  Like  all  such  measures, 
the  questionnaire  could  not  examine  pilot  workload  over  the  entire  flight  profile. 
It  could  only  sample  pilot  perceptions  at  the  flight's  termination.  Pilots  were 
asked  to  respond  on  eight-point  scales  (see  appendix  J).  The  mean  responses  for 
each  questionnaire  item  and  the  results  of  ANOVA  are  described  in  table  24.  As 
with  the  inflight  data,  masters  pilots  reported  lower  workload  than  journeymen. 
This  was  a  strong  and  significant  effect  on  all  questionnaire  items.  Three  out  of 
the  fuui  iteuis  also  demonstrated  a  flights  effect  with  both  groups  of  pilots 
reporting  somewhat  lower  workload  in  the  second  flight.  This  was  in  line  with  the 
inflight  data. 

One  problem  with  questionnaire  data  is  that  items  are  often  redundant  with  each 
other.  This  means  that  responses  to  one  or  more  items  tend  to  be  similar  or 
identical.  Visual  inspection  of  the  data  led  to  the  conclusion  that  this  was 
probably  the  case,  and  a  factor  analysis  was  completed  on  the  data.  Factor 
analysis  is  a  statistical  technique  which  examines  the  relationships  between 
variables  and  determines  if  the  variance  can  be  explained  in  simpler  terms.  In  the 
case  of  the  four-item  questionnaire,  all  the  items  are  loaded  on  one  factor.  A 
factor  is  a  composite  of  all  the  variables  which  lo,sd  on  it.  Factor  loadings  are 
correlations  of  the  variables  with  the  factor.  Factor  loadings  are  presented  in 
table  23. 


39 


TABLE  24.  POSTFLIGHT  QUESTIONNAIRE  RESULTS 


Fint  Queftion;  How  hard  were  70U  working  durlii(  chii  flight? 

Mean  Raapontai  Analpaia  of  Variance 

F  Correlation 


Flighta 

Maatera 

Journeymen 

Variable 

DF 

Rat  io 

PiloCa 

1.  22 

21.97*** 

1 

4.33  (l.-’l) 

7.42  (1  .38) 

Flighta 

1.  22 

3.16 

2 

i.08  (1.62) 

6.25  (1.91) 

Interaction 

1.  22 

1.32 

Second  Queation;  What  fraction  of  the  tine  were  you  bury  during  the  flight? 
Mean  Reaponaea  Analyaia  of  Variance 


Flighta  Maatera  Journeymen  Variable 

Pi lot a 

1  4.75  (2.42)  7.75  <1.54)  Flighta 

2  4.08  (2.16)  7.0a  (1.50)  Interaction 


F  Correlation 

Ratio  Batio 

38. < 

4.24*  1. 

0  I 


Fourth  <)ueation:  How  did  you  feel  during  thia  flight  (higher  niabera  indicate 
higher  ctreer)? 

naan  Reaponaea  Analyaia  of  Variance 

F  Correlation 


Flighta 

Maatera 

JournUTaea 

Variable 

PF 

Ratio 

Rat  io 

Filora 

1.  2? 

17 .15*** 

31. 8Z 

1 

4.58  (1.83) 

7.25  (2.01) 

Flighta 

1,  22 

9.51** 

8.7X 

2 

3.42  (1.38) 

5.83  (1.99) 

Interact  ion 

1,  22 

0.09 

OX 

**•  Pd.OOl  **  PC-Ol  •  P<.05 
Note:  Standard  deviationa  are  ahovn  in  parantheeie. 


40 


TABLE  25.  FACTOR  LOADINGS  OF  POSTFLIGHT  QUESTIONNAIRE 


Questionnaire  Item  Loading 


1  0.902 

2  0.946 

3  0.934 

4  0.903 


since  all  Che  (jue  a  t  ionna  ir  e  items  load  on  one  factor,  the  questionnaire  is 
essentially  a  one-dimensional  measure  of  workload.  The  same  packaged  software 
(BMDP  4M)  that  accomplished  the  factor  analysis  also  produced  a  workload  score  for 
each  individual  on  each  flight.  This  score  was  a  standardized  value.  This  meant 
that  the  distribution  of  workload  factor  scores  took  on  the  characteristics  of  a 
normal  distribution  (bell  shaped  with  a  mean  of  zero  and  a  standard  deviation 
of  one) . 

These  factor  scores  which  represented  each  individual's  perception  of  workload,  as 
measured  after  the  flight,  were  correlated  with  a  total  inflight  workload  score 
which  was  produced  by  summing  the  inflight  responses  across  all  the  flight 
segments.  Correlations  were  computed  from  each  of  the  pilot  groups  separately 
and  for  all  of  the  data  together.  A  scatterplot  of  all  the  data  is  presented  in 
figure  4.  A  correlation  of  0.823  indicates  a  strong  positive  relationship  between 
the  two  data  sets  —  inflight  and  postflight.  When  masters  pilots  are  considered 
alone,  this  relationship  holds  (figure  5).  A  correlation  of  0,858  indicates  that 
the  inflight  and  postflight  measures  were  consistent.  When  journeymen  were 
considered  alone,  however,  there  was  much  less  consistency  (figure  6).  The 
correlation  was  0.451  which  indicates  a  low-to-mo derate  positive  relationship. 
These  findings  were  similar  to  those  of  an  earlier  experiment  in  which  difficulty 
level  was  varied  for  a  group  of  experienced  pilots,  more  like  the  masters  in  the 
current  study,  (Stein  and  Rosenberg,  1983).  In  the  earlier  study,  at  love-to- 
moderate  difficulty,  inflight  and  postflight  measures  of  workload  were  highly 

correlated.  In  the  most  difficult  flight,  this  relationship  broke  down,  and  it 

became  obvious  that  the  two  types  of  measures  were  really  measuring  different 

aspects  of  the  workload  experience.  In  the  masters-journeymen  study,  there  was  one 
level  of  difficulty  but  two  sets  of  perceived  workload.  For  the  journeymen  who  had 
to  work  harder  to  deliver  a  mean  performance  tV»at  was  not  the  equal  of  the  masters 
group,  the  construct  of  workload  apparently  takes  on  more  dimensions  that  differ 
from  inflight  experience  to  postflight  memory. 

COMPARISON  BETWEEN  KEY  VARIABLES.  A  number  of  measures  of  workload  and  performance 
have  been  discussed.  Some  of  the  moot  interesting  findings  of  this  study  are  those 
which  investigate  the  relationships  between  key  measurement  variables.  In  the 
workload  section  of  the  results,  it  was  apparent  that  the  inflight  workload  measure 
(when  pooled  across  the  flight  segments)  produced  similar  results  as  did  the 

postflight  questionnaire.  The  remainder  of  this  section  will  discuss  the 
correlations  between  other  pairs  of  key  variables.  These  correlations  will  be 
illustrated  using  scatterplots  and  regression  lines  where  they  are  applicable. 


41 


-i»0.  .  ♦ 


.. .f.. 

r.^o  xi.'t 

. 10.0  . 

N«  AO 

.  j:  tH«_  .  «»i  .  . . . 


17.3  22.3  27.5  32.5  37.5 

..  ^0*9  _  _  _40.p  _ _ 35.0 _ 

INFLIGHT  WORKLOAD  TOTAL 


«  «  «  ^  «  V  *  ▼ 

<»2.5 
AO.O  _ 


HtAi«  5T.3c..  KcUKgiSlU.N  Ll  ^E  KES.  MS. 
2A.103  a.uOa?  X«  7.5533*t*  2A  .OAO  25.h2d 

.C0I2J  .53450  t*  .Od925*X-2..l430  .30113. 


V*KlAi)LA. _ > _ laCM»l.E.  .Vt!iStS_t3K,irt6LC  _..2.  ,<.*4TUK  .  FJh  OKliJP_jaA STfc R  _ _ SV«adLii« 

VARlAbce  3  t3i.UH5  VfcKjb^  KARIAcuE  Z  FACTUR  fuK  URJJP  JPUR4VR9  SYMlX'J 


FIGURE  4 .  SCATTERFLOT  OF  WORiOLOAD  VARIABLES  —  MASTER  AND  JOURNEYMAN  PILOTS 


42 


POSTFLIGHT  WORKLOAD  FACTOR 


43 


POSTFLIGHT  WORKLOAD  FACTOR 


Si^Ailth  PLOT  i-WLHM.uAa 

_  ....  ■  •  <  t  ^^J*-.**  ■  «  **  ••  •  «4>«  f  •  ^  A  A. 

^..0  + . .  ...  ._-  . .  . . . t 

*  9 

.  •  •  .  .«  -  ...  ...  »  ....  J  .  .  ....  . . • 


1*6  ♦ 


bO  ♦ 


t-^OL.  .♦ 


Y 


.  ..  J 


J  J 


..  .  4 _ 


_ 


0.0  ♦ 

..  _ 

•  4 

9  J 

-.‘>0  * 


_ J _ 


_ A. 

.  to  ♦ 


11 .  i  .  ♦  _ _ 

9 

.  ..XI. 


H*  20 

.J.ve?  _«Aaow... 


22.^  2tl.5  2t«?  21.* 

__  2f  .0  2J  .0  _ 30.0  ,  ,33.0 


3'^=  ?  37.5  53.5 

. .36.0 _  39.0,  . 52.0 

INFLIGHT  WORKLOAD  TOTAL 


.  .  HfcAh  .ST.Jtv.  HEliHeSSiCA  LlAt  KES.MS. 

X  a9.989  5.9»ti  X-  3.57S2*Y*  27.365  29-  795 

r  .  .oosjo  .07550  Y»  .05no.*5-.523.iy _ .;a27i  .  . .  . .  . 

.VAHUOtt  ..  ,3 _ _  YeKitS.YAHiAU.U...  2.  f  ACIOK..  ,.  PUK,  uHOj  E.  JOOKNYMN  _.5Y.AB0i^«. 


FIGURE  6.  SCATTERFLOT  OF  WORKLOAD  VARIABLES  — JOURNEYMAN  PILOTS 


44 


The  first  relationship  to  be  considered  was  among  the  traditional  measures  of  pilot 
performance,  the  rating  scales,  and  the  results  of  the  APM  System  using  the  PPI . 
Correlations  and  scatterplots  were  computed  for  each  pilot  group  individually  and 
for  the  entire  sample  together.  Figure  7  shows  that  a  weak  relationship  existed 
between  the  performance  ratings  and  PPI  scores  for  masters  pilots.  Note,  that  the 
data  on  both  axes  have  been  standardized  by  converting  them  to  z  scores.  This 
provides  a  better  basis  for  comparison  since  It  normalizes  both  variables.  In 
figure  7,  we  see  a  much  wider  dispersion  of  scores  In  the  PPI  than  in  the 
performance  ratings.  A  tendency  of  observers  to  avoid  the  end  points  of  a  scale  is 
a  common  problem  In  rating  type  data.  However,  It  is  also  possible  that  with 
the  masters  pilot  group,  which  was  fairly  homogeneous,  the  observers  were  not  as 
discriminating  as  the  PPI.  In  figure  8,  the  spread  of  performance  ratings  was  much 
greater  for  journeymen;  and  consequently,  the  strength  of  the  relationship  between 
the  two  variables  was  much  stronger  r  •  .75.  Finally,  figure  9  shows  a  scatterplot 
for  the  entire  participant  sample,  and  the  difference  in  performance  spread  between 
the  pilot  groups  becomes  apparent.  Given  this  heterogeneity  of  performance,  the 
correlation  of  r  =  .82  provides  a  demonstration  that,  overall,  the  PPI  appears  to 
be  valid  against  the  traditional  measurement  system.  However,  with  a  homogeneouc 
group  of  performers  like  the  master  level  pilots,  the  PPI  and  the  performance 
ratings  diverge  in  terms  of  their  ability  to  separate  individuals  on  a  performance 
continuum. 

Using  standardized  data,  the  PPI  was  compared  to  the  pilots  workload  responses 
in  flight.  The  first  comparison  was  made  using  total  flight  s  ores  for  both 
variables.  Figure  10  Is  a  scatterplot  for  the  masters  pilot  group.  No  relation¬ 
ship  existed  between  their  Inflight  workload  responses  and  PPI  scores.  The 
Journeymen  pilots,  when  considered  alone,  showed  a  mild  negative  relationship 
(r  ”  -.29)  between  workload  and  performance  (figure  11).  When  both  groups  were 
considered  together,  a  broader  range  of  workload  and  performance  was  depicted  and  a 
moderate  (r  =  -.567)  correlation  appeared  (figure  12).  Pilots  tended  to  report 
lower  subjective  perceptions  of  workload  when  they  pe~formed  at  higher  levels.  In 
general,  journeymen  pilots  felt  they  had  to  work  harder  to  produce  less.  Although 
from  the  scatterplot  In  figure  !.<.  it  might  appear  that  a  curvilinear  regression 
might  account  for  more  variabllty  between  workload  and  performance  than  the  linear 
model,  this  was  not  the  case.  Attempts  to  fit  a  polynomial  regression  to  the  data 
did  not  Improve  the  correlation  markedly.  The  correlations  for  quadratic  and  cubic 
fits  were  r  •=  -.567  and  r  -  -.573,  respectively. 

Since  the  inflight  workload  (when  summed  for  the  whole  flight)  and  the  postflight 
workload  questionnaire  results  were  strongly  correlated,  the  next  set  of 
comparisons  will  not  be  surprising.  The  postflight  workload  factor  scores  were 
correlated  against  the  APM  data.  For  the  master  pilots,  there  was  no  relationship 
(figure  13).  In  contrast,  the  journeymen  pilots  had  a  low,  but  significant 
(r  «  -.A2)  (P<.01),  relationship  (figure  14).  When  all  data  were  considered,  the 
postflight  workload  factor  produced  a  very  similar  correlation  with  the  APM  data  as 
had  the  inflight  measure  (r  *  -*57,  P<.01)  (figure  15). 


45 


r  t  fit  or<tl‘^'l(  I 

.63  &S  'J  i  .T.  i  .  30  1  .‘V.  i  .00 


FIGURE  7.  SCATTERPLOT  AND  REGRESSION,  AUTOMATED  PERFORMANCE 
MEASUREMENT  RATINGS  —  MASTER  PILOTS 

i 

I 


46 


I 


AUTOMATED  PERFORMANCE  MEASUREMENT  USING  PILOT  PERFORMANCE  INDEX 


FIGURE  8.  SCATTERPLOT  AND  REGRESSION,  AUTOMATED  PERFORMANCE 
MEASUREMENT  RATINGS  —  JOURNEYMAN  PILOTS 


47 


Pf.RrORMiiNCf;  RflTlMGS 
.00  -1.00  -i.no  -0.00  0.00 


FIGURE  9.  SCATTERPLOT  AND  REGRESSION,  AUTOMATED  PERFORMANCE 
MEASUREMENT  RATINGS  —  ALL  PILOTS 


INFLIGHT  WORKLOAD 


49 


INFLIGHT  WORKLOAD 


i 

i 

I 


-^-J _ ^ 


♦ 


_  .  ._  A.  J 

i.£  * 

« 

1.5  ♦ 

1.2  ♦ 


0.0  * 


-•30  ♦  ♦ 

•  • 

•  mm^m »  mmtX • m  9 ^0  9  9 

-2.25  -1.75  -1.25  -.T50  -.250  .250  .750 

-^.00  -1.30  •'1.00  -.SOO  Q.OO  .soo  1.00 

AUTOMATED  PERFORMANCE  MEASUREMENT  USING  PILOT  PERFORMANCE  INDEX 

cC**--.i804  TOTAL 

MfcA.  ^t.l)£».  i<EiiR£l«iON  LIRE  RES.HS. 

X  -.66500  X— .60!t0»»- .JOtH  .tT^17 

5  .6£4a0  .600X9  Va-.2ail8*X«  .51169  .66726 


WARlXgLc 


j  Afnl 


wORiis  vXRi«ei.e 


i  HLT 


FOR  UROUP  JOuRNVHN  SYH»0L*-i 


FIGURE  11.  SCATTERPLOT  AND  REGRESSION,  INFLIGHT  WORKJ.OAD  AND  AUTOMATED 
PERFORMANCE  MEASUREMENT  —  JOURNEYMAN  PILOTS 


1 

50 


f 

\ 


INFLIGHT  WORKLOAD 


SCAlltH  t»(,uIS-STANO*<lQ  SLu«tjF-»i*J 


51 


^  «  4  • 


POSfFLiaiT  WOEXLOAO  FACTOR 


MOaiCLOAD  FAC, 


♦  o 


POSTFLIGHT  WORKLOAD  FACTOR 


sciTTER. Plots  factor  vs.  apmss  ...  . 


2.1  ♦ 


=2,1  *  ♦ 


'2,A5  -I. 75  -1.15  -.550  ,350  1.15  1.75 

-2.31  -2.11  -l.AO  -.70')  0.00  ,700  l.tO 

automatf.d  performance  measurement  using  pilot  performance  index 

riB  — .1727  _  TOTAL 

MEAN  ST.1EV.  SEIRESSION  LINE  RES. MS. 

X  .•>1227  .1155R  X--.5«337*V-. 13195  '  .67926 . 

r  -.05695  .97535  Y  —  .562156X- .05568  •*5^55 

VARIABLE  3  ARM  VE»SUS  VARIABLE  2  FACTOR  .  FOR  CRTJP  MASTER  SrHBOL  =  M 

VARIABLE  3  APm  VERSUS  VARIABLE  2  FACTOR  F3R  GROUP  JO'JRNYMS  SYMBOL*J 


FIGURE  15.  SCATTERPLOT  AND  REGRESSION,  PUSTFLIGHT  WORKLOAD  AND 
AUTOMATED  PERFORMANCE  MEASUREMENT  —  ALL  PILOTS 


54 


The  final  comparisons  for  this  section  of  the  report  were  Chose  between  the 
poBtf light  workload  factor,  which  was  produced  from  Che  pilots'  questionnaire 
responses  and  the  performance  rating  totals  for  each  flight.  In  this  comparison, 
both  masters  and  journeymen  pilots  produced  significant  {P<.05)  correlations 
between  the  two  variables,  and  these  correlations  were  very  similar:  r  =  -.505  for 
masters  and  r  =  -.A67  for  journeymen.  See  figures  16  and  17  for  the  scatterplots • 
Figure  18  shows  the  data  when  all  pilots  were  considered  on  the  same  plot.  A 
correlation  of  r  =  -.710,  the  coefficient  of  determination  of  r  squared  was  0.50A. 
This  meant  that  only  about  half  the  total  variability  was  accountable  with  the 
regression  line.  The  reader  can  see  this  by  simply  examining  the  scatter  around 
the  regression  line. 

There  appears  to  be  a  relationship  between  a  pilots  perception  of  workload  and 
their  performance  in  flight.  This  relationship  exists  across  measurement  methods 
when  there  is  a  spread  of  piloting  talent  available  in  the  participant  sample.  The 
relationship  which  Is  represented  by  a  negative  correlation  indicates  that  less 
experienced  pilots  feel  they  are  working  harder  but  are  apparently  performing 
poorer  than  their  more  experienced  colleagues.  The  relationship  is  not  perfect 
even  when  it  Is  the  strongest,  and  this  needs  to  be  researched  further. 


DISCUSSION 


Throughout  the  history  of  person-machine  systems,  there  have  been  many  attempts  to 
Isolate  and  measure  performance.  Aviation  has  presented  unique  problems  because  of 
its  complexity  and  pace  of  activity.  This  current  researcVi  has  evaluated  an  APM 
System  for  use  in  general  aviation  simulation  research. 

Twenty-four  pilots  participated  in  this  simulation-based  study.  Although  they  may 
or  may  not  have  been  representative  of  general  aviation  at  large,  their  respective 
performances  can  serve  as  a  viable  indication  of  the  potential  of  this  APM  System. 

The  PPI  was  developed  analytically  by  a  small  group  of  subject  matter  experts  based 
on  their  experience  and  flight  knowledge.  The  PPI  was  based  on  an  implied  flight 
task  taxonomy  built  around  segments  of  flight  and  variables  within  segments. 
The  analytic  product  from  the  subject  matter  experts  was  honed  using  the 
man  ter- journeyman  design.  This  approach  was  based  on  the  assumption  that 
experienced  pilots  should  perform  better  In  flight  and  that  any  measurement  system 
should  be  able  to  discriminate  them  from  their  less  experienced  colleagues. 
Initial  analyses  '■creened  out  those  variables  which  did  not  separate  the  two 
groups  and  also  those  where  there  was  a  large  performance  change  between  flights, 
indicating  a  learning  or  immediate  experience  effect.  The  results  showed  that  the 
revised  PPI  would  discriminate  between  the  two  groups  o2  pilots,  and  for  the  most 
part,  the  separation  was  great. 

Despite  this  performance  differential,  the  two  groups  proceeded  across  the  flight 
segments  with  a  similar  pattern  —  descent  being  the  segment  of  poorest  performance 
and  final  approach  being  the  best.  Descent  is  a  transition  segment  where  many 
things  are  occurring  with  a  very  dynamic  sequence  of  demands  being  placed  on  the 
pilot.  In  final  approach,  communication  and  planning  arc  minimal,  and  Che  pilot 
primarily  has  to  hold  the  aircraft  on  the  Instrument  Landing  System  (ILS).  This 
could  be  a  classic  example  of  how  the  time-sharing  requirement,  an  element  of 
of  workload,  affects  performance.  Wlien  the  pilot  can  concentrate  on  one  primary 
task,  performance  is  the  closest  to  the  standards  using  PPI. 


55 


POSTFLIGHT  WORKLOAD  FACTOR 


-.U90  .iiaO  .J.'55  .iiiO  1,125  1.375 

-.^sCd  C.OCa  ,2500  .5000  .  750  0  1. 000  1.250 


.  PERFORMANCE  RATING  TOTAL 

HfcAh  5[.JcV.  RttoKciiiON  Kc5.  fS. 

~A  .'79955  •90ie>i  A"— .22<265y«  .24552  .10250 

_Y  -.t2<91  .201/2  Y—l.llli*At  *22159  .50SU 

5  i>ci<f  v{i<ll,5  V«RlAtLE  2  FACTOR  Furt  OROUP  MASTER  SyMi:Ul.«n 


FIGURE  16.  SCATTERPLOT  AND  REGRESSION,  POSTFLIGUT  WORKLOAD  FACTOR 
AND  PERFORMANCE  RATING  TOTALS  —  MASTER  PILOTS 


POSTFLICHT  WORKLOAD  FACTOR 


^Lwiitu  PEi;rc|j(>ANCE 


.I*  i.i 
i.UK=^  .4«cS 


PERFORMANCE  RATING  TOTAL 


PtAA  Ji.jfcv. 

-.cCCOO  .  lanCh 
•  •Tj.CO'f 


AEfcKtuLjE  LIAE  HEi.MS. 

iiJti  .AasfcS 
Y— .4S<tt*A*  ,l<i60  .<iA10l 


VAKiAoLE  A  Pcnr 


VEA'Ei  VAAiAELb  i  FACT  CP  foK  CAQUP  JOUKNYMN  iYHtfOC-J 


FIGURE  17. 


SCATTERPLOT  AND  REGRESSION,  FOSTFLIGHT  WORKLOAD  FACTOR 
AND  PERFORMANCE  RATING  TOTALS  —  JOURNEYMAN  PILOTS 


57 


POSTFLIGUT  WORKLOAD  FACTOR 


SCAllcN  ;!uK  PEKFuH^ANCE 


<  .u 


1. ' 


J 


i«0 


J 


J 


J 


J 


J 


J 

J 


H 

H 


J 


H 


•law  ♦ 


•i.-  t 


C 


•  •«««▼«.« 

-il.  1 

■—*-*'» 

h» 

C  wrs*—* 


-1.3  -.40  <^.30  .30  .40  1-3 

lit  —1:^  —.60  0.0  .^0  1.^ 

PERFORMANCE  RATING  TOTAL 


M  c 

ll  --CiA^S 

J  (  mun  •  • 

1  ,00  .10 

Kct>K£l*io<^  dtit 
A»-.i^si»*r-.OA:ex 
m*  A-.C‘»5^ 

AEi.  Hi. 
.5>09fiA 
.A0310 

VAhIhcIC 

VAK  lAui.t 

J  ♦'t  Hh 

i  fALfOK 
i  fACIci; 

FuK 

FuK 

.IALoP 

UhOOP 

riA 

juokmyhn 

SYMBQU=M 
symbol :  0 

FIGURE  18.  SCATTERPLOT  AND  REGRESSION,  POSTFLIGHT  WORKLOAD  FACTOR 
AND  PERFORMANCE  RATING  TOTALS  —  ALL  PILOTS 


58 


Performance  rating  w^s  also  accomplished.  There  were  a  number  of  reasons  for 
collecting  this  information.  Several  references  in  the  literature  stress  the 
importance  of  examining  performance  from  multiple  perspectives.  Also,  use  of 
performance  rating  is  an  established  tradition  iri  aviation,  and  it  could  serve  (if 
reliable)  as  an  indicator  of  concurrent  validity  for  the  APM  data. 

The  reliability  of  the  ratings  on  individual  scales  within  segments  of  flight  was 
mediocre,  especially  for  the  journeymen  pilots.  However,  when  the  scale  data  were 
pooled  to  produce  segment  scores  for  each  flight,  the  reliability  as  measured  by 
interrater  correlations  was  excellent.  The  results  from  the  independent  raters 
were  pooled  and  used  for  subsequent  analyses.  This  led  to  an  outcome  very  similar 
to  that  achieved  for  the  PPI  collected  via  APM.  The  two  pilot  groups  were  neatly 
separated,  and  there  was  variability  across  flight  segments.  The  pattern  across 
the  segments  differed  somewhat  for  the  two  groups,  and  the  relative  order  of  the 
segments  was  quite  different  from  the  PPI  data.  For  example,  for  both  groups,  the 
observer's  evaluations  of  the  worst  performance  in  a  given  segment  was  the  final 
approach  —  which  was  best  using  the  PPI.  Obviously,  the  PPI  and  the  observers 
were  tuned  to  different  sources  of  information  when  evaluating  performance  down  to 
the  segment  level.  The  PPI  wrs  measured  against  fixed  predetermitied  standards. 
The  observers  each  rated  according  to  internalized  standards  developed  from 
personal  experience  and  shared  agreements  established  during  observer  training. 
This  is  a  classic  example  of  how  results  can  be  influenced  by  the  measurement 
technique,  although  both  methods  produced  practically  identical  overall  results. 

Despite  every  effort  to  avoid  an  interflight  performance  change,  both  methods  of 
measurement  showed  a  significant  improvement  between  flights.  Although  these 
effects  were  significant,  they  were  of  small  magnitude  and  accounted  for  very 
little  variance.  They  were  probably  a  function  of  route  and  air  traffic  control 
familiarity  the  second  time  each  pilot  flew  the  same  scenario.  The  only  way  to 
avoid  this  v/ould  have  been  to  use  a  different  but  comparable  flight  plan,  which  may 
have  confounded  the  results  in  some  other  fashion. 

Pilot  workload  was  measured  in  two  ways  during  this  project:  inflight,  using  a 
real-time  response  box;  and  postflight,  using  a  questionnaire.  Both  measures, 
which  were  of  the  subjective  self-report  type,  demonstrated  a  difference  between 
the  two  pilot  groups.  The  journeymen  pilots  reported  consistently  higher  workload. 
Both  measures  showed  a  decrease  in  workload  from  the  first  to  the  second  flights. 
As  the  pilots  become  more  familiar  with  the  specific  flight  geometry,  their 
perceived  workload  decreased.  Both  groups  of  pilots  reported  they  were  working 
harder  during  initial  and  final  appioaclics  in  comparison  to  en  route  flight.  One 
would  expect  workload  to  be  higher  in  these  transition  segments  when  compared  to 
the  relatively  stable  environment  while  en  route. 

The  measures  of  workload  for  inflight  and  postflight  were  highly  related,  for  the 
master  pilots  and  for  the  entire  participant  sample.  VTien  the  journeymen  were 
considered  alone,  however,  the  relationship  was  somewhat  weaker.  Apparently  when 
the  difficulty  for  a  pilot  group  is  high,  as  it  probably  was  for  the  journeymen, 
workload  is  perceived  differently  when  actually  performing  than  after  completing 
the  task  or  landing  the  aircraft.  The  masters  group  produced  a  higher  level 
of  performance  with  a  lower  perceived  workload.  It  is  logical  that  a  highly 
experienced  pilot's  work  would  be  easier  than  one  who  is  less  experienced.  The 
former  has  over learned  many  key  behaviors  while  the  journeymen  must  invest  thought 


59 


and  trial  and  error  In  order  to  accomplish  a  task.  It  would  appear  that  given  the 
wide  separation  of  flying  hours  between  the  two  participant  groups,  experience  does 
count  when  it  comes  to  workload.  There  is  no  way  to  generalize  this  conclusion 
when  the  experience  separation  is  less  between  groups  (l.e.,  1,000  hours  versus 
2,000  hours)  than  it  was  in  this  experiment.  Further  study  would  be  needed. 

A  series  of  scatterplots  and  correlations  were  presented  in  the  "Comparison  Between 
Key  Variables."  The  PPI  produced  by  automated  performance  measurements  was  able  to 
spread  Individual  performance  of  masters  pilots  better  than  the  ratings  system. 
The  masters  group  pilots  performance  appeared  more  homogeneous  to  the  raters,  and 
separation  required  finer  levels  of  discrimination  than  the  raters  were  capable  of 
determining.  In  order  for  correlation  to  function  as  a  relationship  index,  both 
variables  must  be  spread  over  a  continuum.  This  lack  of  spread  in  the  rating-’ 
for  the  masters  lowered  the  correlation.  However,  when  all  participants  were 
considered,  the  PPI  and  the  ratings  were  well  correlated,  indicating  that  both 
measures  tend  to  order  performance  in  similar  ways.  This  would  be  less  likely 
if  the  comparison  was  made  on  a  segment-by-segment  basis.  The  two  measures  are 
most  similar  In  overall  flight  performance  evaluation  and  less  similar  when 
comparisons  are  made  within  flights. 

Comparisons  were  also  made  between  workload  and  performance  measures.  This 
is  an  area  that  has  not  been  seriously  considered  in  other  research  studies.  When 
comparing  the  PPI  data  with  inflight  workload,  there  was  no  relationship  for  the 
masters  group  and  a  mild  negative  relationship  for  the  journeymen.  When  the  entire 
sample  was  considered,  a  moderate  r  =  -.567  negative  correlation  appeared.  This 
indicated  that  the  workload  was  lower  for  those  performing  better  (generally 
the  masters  pilot).  This  is  In  agreement  with  the  the  results  on  workload  and 
performance  already  discussed.  The  results  were  very  similar  for  the  postflight 
questionnaire. 

The  postfilght  workload  factor  was  a  composite  of  the  four  questionnaire  items 
produced  by  factor  analysis.  It  correlated  moderately  well  with  observer  ratings. 
The  correlations  were  also  negative,  indicating  an  association  of  higher 
performance  with  lower  workload.  The  journeymen  were  working  harder  to  produce 
less. 

This  study  represented  a  unique  situation  in  that  there  was  a  large  separation 
between  the  two  subgroups  iu  tetius  of  experience.  The  purpose  of  this  separation 
was  to  provide  the  various  measurement  systems  an  opportunity  to  perform,  and  they 
did.  However,  the  relationship  between  workload  and  performance  will  require 
further  study  with  a  more  representative  sample  of  pilot  experience  and/or  a  wider 
dispersion  of  workload  conditions  Induced  by  -'arylng  degrees  of  flight  difficulty. 


60 


CONCLUSIONS 


An  Automated  Performance  Measurement  (APM)  System,  called  the  Pilot  Performance 
Index  (PPI)  and  developed  at  the  FAA  Technical  Center,  was  successfully  tested  in 
an  Initial  evaluation,  and  the  results  were  as  follows: 

1.  The  APM  System  was  more  effective  than  observer  rating  in  spreading  the 
performances  of  experienced  pilots* 

2.  Vfhlle  APM  and  observer  ratings  separated  the  two  pilot  groups  in  terms  of 
overall  flight  performance,  they  differed  considerably  when  separation  was  examined 
at  a  more  molecular,  flight-segment  level* 

3.  Masters  pilots  reported  consistently  lower  workload  and  produced  consistently 
better  overall  flight  performance  than  the  journeymen. 

A.  There  appears  to  be  an  inverse  relationship  between  workload  and  performance 
when  tne  participant  sample  is  heterogeneous. 


61 


REFERENCES 


1.  Berliner,  D.  C.,  Angell,  D.  ,  and  Shearer,  J.  W.  ,  Behaviors,  Measures  and 
Instruments  for  Performance  Evaluation  in  Simulated  Environments.  Proceeding 
of  the  Symposium  and  Workshop  on  the  Quantification  of  Human  Performance, 
August  1964,  227-296. 

2.  Brictson,  C.  A.,  McHugh,  W. ,  and  Naitah,  P.,  Prediction  of  Pilot  Performance: 
Biochemical  and  Sleep  Mood  Correlates  Under  High  Workload  Conditions.  Proceedings 
of  the  AGARD  Conference  on  Simulation  and  Study  of  High  Workload  Conditions, 
AGARD- CP-146,  October  1974,  (NTIS  No.  A13-1-A13-8) . 

3.  Childs,  J.  M.  ,  Development  of  an  Objective  Grading  System  Along  With 
Procedures  and  Aids  for  Its  Effective  Implementation  in  Flight,  Research  Memoradum, 
Canyon  Research  Group,  Ft.  Rucker,  Alabama,  May  1979. 

4.  Christensen,  J.  M,  and  Mills,  R.  G.,  What  does  the  Operator  do  in  Complex 
Systems.  Human  Factors,  1967,9,  329-340. 

5.  Connelly,  E.  A.,  Schuler,  A.  R.,  and  Knoop,  P.  A.,  Study  of  Adaptive  Mathama- 
tical  Models  fo r  Deriving  Automated  Pilot  Performance  Measurement  Techniques 
Vols.  1  &  2,  Air  Force  Human  Research  Laboratory  Technical  RepoTt  rAFTlRL-TR-69-7) , 
1969. 

6.  Damos,  A.,  and  Lintern,  A.,  A  Comparison  of  Single  and  Dual  Task  Measures 
to  Predict  Pilot  Performance,  Air  Force  Office  of  Scientific  Research  Technical 
Report  "(AFO'sR-79-2),  Bolling  AFB,  D.C.,  May  1979,  (NTIS  AD  A084-237). 

7.  Engel,  J.  D.  ,  An  Approach  to  Standardizing  Human  Performance  Measurement, 
Human  Resourses  Research  Organization  Professional  Paper  26-70,  March  1970, 
(NTIS  AD  717258). 

8.  Fleisfanan,  E.,  Performance  Assessment  Based  on  an  Empirically  Derived  Task 
Taxonomy,  Human  Factors,  1967,  349-366, 

9.  Fleishman,  E.  A.,  Systems  for  Describing  Human  Tasks.  American  Psychologist, 
1982,  37(7),  821-834. 

10.  Fuller,  J,  H.,  Waaq,  W.  L.  ,  and  Martin,  E.  L. ,  Advanced  Simulator  for  Pilot 
Training:  Design  of  an  Automated  Performance  Measurement  System,  Air  Force  Human 
Research  Laboratory  Technical  Report  (AFHRL-TR-79-57) ,  August  1980. 

11.  Furrell ,  J.  P.,  Measurement  Criteria  in  the  Assessment  of  Helicopter  Pilot 
ler formance ,  paper  presented  at  conference  on  Aircrew  Performance  in  Army  Aviation 
I'.S.  Army  Aviation  Center,  Ft.  Rucker,  Alabama,  November  1973. 

12.  Gerathewohl,  S.  J.,  Psychophysical  Effects  of  Aging  —  Developing  a 
Functional  Age  Index  for  Pilots:  II,  Federal  Aviation  Administration  Technical 
■Repo r t  ( FAA- AM- 78 - 1 6 ) ,  April  1978b,  (NTIS  AD  A059-356) 


62 


13.  Gerathewohl,  S.  J.,  Paychophyaiologieal  Effects  of  Aging  —  Developing  a 

Functional  Age  Index  For  Pilots:  III  -  Measurements  of  Pilot  Performance,  Federal 

Aviation  Administration  Technical  Report  (FAA-AM-78-27) ,  August  1978a, 
(NTIS  AD-A062501). 

14.  Gondek,  P.  C.,  What  You  See  May  Not  Be  What  You  Think  You  Get:  Discriminant 

Analysis  in  Statistical  Packages,  Educational  and  Psychological  Measurement,  1981, 
41,  267-281.  ■ 

15.  Henry,  P.  H.,  Turner,  R.  A,,  and  Matthie,  R.B.,  An  Automated  System  to  Assess 
Pilot  Performance  in  a  Link  GAT  1  Trainer,  U.S.  Air  Force  School  of  Aerospace 
Medicine  Technical  Report  ( SAM-TR-74-41) ,  Brooks  AFB,  Texas,  October  1974, 
(NTIS  AD/A-004780). 

16.  Hill,  J.  W.,  and  Eddowes,  E.  E.,  Further  Development  of  Automated  GAT  1 
Performance  Measures,  Air  Force  Human  Resources  Laboratory  Technical  Report 
(AFHRL-TR-73-72),  Brooks  AFB,  Texas,  May  1974,  (NTIS  AD-783240). 

17.  Hill,  J.  W.j  and  Goebel,  R.  A.,  Development  of  Automated  GAT-1  Performance 
Measures,  Air  Force  Human  Resources  Laboratory Technical  Report  (AFHRL-TR-71-8) , 
Williams  AFB  , Arizona,  May  1971,  (NTIS  AD  732616). 

18.  Knoop,  P.  A.,  and  Welde,  W.  L.  ,  Automated  Pilot  Performance  Assessment  in 
the  T-37 :  A  Feasibility  Study,  Air  Force  Human  Research  laboratory  Technical  Report 
TTR-72-6),  Wright  Patterson  AFB,  Ohio,  April  1973,  (NTIS-AD-766446) , 

19.  Liebowitz,  H,  W. ,  and  Post,  R.  B.,  Capabilities  and  Limitations  of  the 
Human  Being  as  a  Sensor.  In  J.  T.  Kuznicki  and  R.  A.  Johnson  (Eds.),  Problems 
and  Approaches  to  Measuring  Hedonics,  Baltimore,  American  Society  of  Testing  and 
Materials,  1982, 

20.  Linton,  M, ,and  Gallo,  P.  S.,  The  Practical  Statistician,  Monterey,  Brooks- 
Cole,  1975. 

21.  McDowell,  E.  A.,  The  Development  and  Evaluation  of  Objective  Frequency  Domain 
Based  Pilot  Performance  Measure  in  the  ASUPT,  Air  Force  Office  of  Scientific 
Research  Technical  Report  (AFOSR-TR- 78-1239)  Bolling  AFB,  D.C.,  April  1978, 
(NTIS  Ai)-A0599477) , 

22.  Melton,  C.  E,,  McKensie,  J.  R. ,  Kellin,  J.  R. ,  and  Saldivar,  J.  T.,  Effect 
of  a  General  Aviation  Trainer  on  the  Stress  of  Flight  Training.  Aviation  Space  and 
Environmental  Medecine,  1975,  46(1),  1-5. 

23.  Moray,  N.,  Subjective  Mental  Workload,  Human  Factors,  1982,  24(1),  25-40. 

24.  North,  R.  A.,  and  Griffin,  G.  R.,  Aviator  Selection  1919-1977,  Naval  Aeropsace 
Medical  Research  Laboratory  Technical  Report,  Pensacola,  Flordia,  October  1977, 
(NTIS  ADA  048105). 


63 


25.  Obermeyer,  R.  W.,  and  Vreuls,  R.,  Combat  Ready  Crew  Performance  Measurement 
System:  Phase  I  Measurement  Requirements,  Air  Force  Human  Resources  Laboratory 
Technical  Report  (AFHRL-TR-74-108( H) ) ,  Brooks  AFB,  Texas,  December  1974, 
(NTIS  AD  B005518L). 

26.  Poulton,  E.  C.,  Observer  Bias,  Applied  Ergonomics,  1975,6,  3-8. 

27.  Povenmire,  H.  K.,  Alvarres,  K.  M.,  and  Aaraos,  D.  L.  ,  Observer  —  Observer 
Flight  Check  Reliability,  University  of  Illinois  Aviation  Research  Laboratory 
Technical  Report  (LF-70-2),  Savoy,  Ill.,  October  1970. 

28.  Rualt ,  A.,  Measurement  of  Pilot  Workload.  In  N.  Moray  (Ed.),  Mental  Workload, 
New  York,  Plenum,  1979. 

29.  Roscoe,  A.  H.,  Introduction  to  AGARD  Monograph.  Assessing  Pilot  Worklo  d, 
Harford  House,  London,  February  1978. 

30.  Rosenberg,  B.,  Rehmann,  J.,  and  Stein,  E.  S.,  The  Relationship  Between  Effort 
Rating  and  Performance  in  a  Critical  Tracking  Task,  FAA  Technical  Center  Technical 
Report  (DCrr/FAA/EM-81/13) ,  Atlantic  City,  N.J.,  October  1982. 

31.  Shannon,  R.  H.,  Task  Analytic  Approach  to  Human  Performance  Battery 
Development.  Proceedings  of  the  Human  Factors  Society  24th  Annual  Meeting,  1980a. 

32.  Shannon,  R.  H. ,  The  Validity  of  Task  Analytic  Information  to  Human  Performance 
in  Unusual  Eiwironments .  Proceedings  of  the  Human  Factors  Society  24th  Annual 
Meeting,  1980b. 

33.  Sheridan,  T.  B.  ,  and  Simpson,  R.  W.,  Toward  the  Definition  and  Measurement 
of  the  Mental  Workload  of  Tran6i>ort  Pilots,  Massachussetts  Institute  of  Technology 
Final  Report,  1979. 

34.  Skjenna,  0.  W. ,  Cause  Factor:  Human  —  A  Trea.ise  on  Rotary  Wing  Human 
Factors ,  Ministry  of  National  Health  and  Welfar^  (Canada)  Technical  Report, 
Ottawa,  1981. 

35.  Smith,  H.  P.  R.,  A  Simulator  Study  of  the  Interaction  of  Pilot  Workload 
With  Errors,  Vigilain:e  and  Decisions,  NASA  Technical  Memorandum  (78482")”  -  Ames 
Research  Center,  January  1979,  (NTIS  N79-14769). 

36.  Stein,  E.  S.,  and  Rosenberg,  B.,  The  Measurement  of  Pilot  Workload,  FAA 
Technical  Center  Technical  Report  ( DOT /FAA/EM- 81/14),  Atlantic  City,  N.J., 
January,  1983. 

37.  Vreuls,  A.,  and  Obermayer,  R.  W.,  Selection  and  Development  of  Automated 
Performance  Measurement,  paper  presented  at  conference  on  Aircrew  Performance  in 
Army  Aviation,  U.S.  Army  Aviation  Center,  Ft.  Rucker,  Alabama,  November  1973. 

38.  Vroom,  V.  H. ,  Work  and  Motivation,  New  York,  Wiley,  1964. 


64 


APPENDIX  A 


LESSON  PLANS 


TRAINING 


1.0  hour  flight  :1S  pcaCllght 

:1S  poftflight 


OBJECTIVE; 

To  acquaint  the  participant  with  normal  mulciengine  procedures 
and  techniques.  The  participant  will  develop  the  abilities 
required  to  execute  safe  take-offs  and  landings . under  all  normal 
conditions.  Standard  coordination  and  planning  maneuvers  will 
be  demonstrated  and  practiced  to  develop  pilot  familiarity  with 
the  performance  and  flight  control  responses  in  the  General 
Aviation  Cockpit  Simulator.  Standard  attitude  instrument  flight 
training  maneuvers  will  be  performed  to  develop  accuracy  and 
control. 


LESSON  CONTENTS; 

1.  Preflight  discussion 

2.  Cockpit  familiarization 

3.  Normal  take-off 

4.  Aircraft  familiarization  maneuvers 

A.  Straight  and  level  cruise 

B.  Climbs,  climbing  turns,  and  level  offs 

C.  Descents,  descending  turns,  and  level  offs 
0.  Establishing  cruise  and  cruise  operations 
£.  Landing  gear  and  flap  effect  on  aircraft 
F.  Slow  flight 

*C.  Stall  recognition  and  recovery  techniques 

1.  Tak#*K3ff  configuration 

2.  Clean  configuration 

3.  Landing  configuration 

H.  Steep  turns,  45  degree  bank,  and  360  turns  left  and 
right 

*  At  least  on  of  the  following  maneuvers  will  be  at  a 
bank  angle  of  between  IS  to  30  degrees. 

5.  Instrument  review 

A.  Area  departure  and  area  arrival 

B.  VOR  holding 

C.  VOR  and  ILS  approach (as)  and  missed  approach (es) 

6.  Landing 

7. '  Postflight  discussion 


COMPLETION  STANDARDS! 

The  participant  shall  be  familiar  with  the  airplane  systems, 
limitations,  performance,  and  ncrmal  operating  procedures.  The 
pilot  should  perform  all  standard  coordination  maneuvers  without 
deflecting  the  ball  in  Che  bell-benk  indicator,  outside  the  center 
reference  line.  Turns  to  be  within  10  degrees  of  assigned  heading, 
altitude  within  100  feet  of  assigned  altitude,  and  airspeed  within 
10  knots  of  assigned  airspeed.  Stall  recovery  performance  will 
be  evaluated  on  the  basis  of  prompt  recognition  and  smooth, 


A-1 


poalclve  r«cov«ey  action  with  a  minimum  loss  of  altitude  consistent 
with  the  recovccy  of  full  control  effectiveness.  After  recovery, 
the  pilot  will  make  an  expeditious  return  to  the  original  altitude. 
Take-offs  and  landings  will  be  evaluated  on  the  basis  of  technique, 
judgment,  speeds  pet  aircraft  flight  manual,  coordination,  and 
smoothness.  The  instrument  review  will  be  evaluated  on  the  pilot's 
knowledge,  skill,  and  ability  to  operate  the  multiengine  aircraft 
under  normal  instrument  conditions.  Area  departure  and  arrival 
will  be  in  accordance  with  published  area  information,  i.e.,  SIDs 
and  STAKS.  Holding  patterns  will  he  entered  correctly  and  within 
10  knots  of  the  proper  holding  airspeed^  Approaches  will  be  com¬ 
pleted  while  melntaining  the  correct  approach  speed  within  10 
knots  and  the  initial  approach  altitude  with  100  feet.  The  missed 
approach  procedures  will  be  followed  per  instructions  with  the 
pilot  demonstrating  full  and  correct  control  of  the  aircraft  and 
procedures. 

At  the  completion  of  this  lesson,  the  participant  will  demonstta'-e 
attitude  instrument  flight  under  normal  conditions  while  maintaining 
altitude  within  100  feet  and  heading  within  10  degrees  during 
straight  and  level  flight.  Turns  will  be  ->erfocmed  maintaining 
altitude  within  100  feet  and  roll-outs  to  predetermined  headlncs 
within  10  degrees.  Climbs  and  descents  will  be  performed  within 
10  knots  of  the  desired  airpseed  and  level-offs  will  be  completed 
within  100  feet  of  the  assigned  altitude.  The  approaches  will  be 
completed  while  maintaining  tha  corract  approach  speed  within  10 
knots  and  the  initial  approach  altituda  within  100  feet.  The 
pilot  will  be  able  to  laval  off  at  the  HOA  or  DH  and  conduct 
accurata  missad  approach  procaducas. 


A-2 


TRAINING 


1.0  hour  flight 


:15  preflight 
:15  poBCflight 


OBJECTIVE; 

To  acquaint  the  participant  with  normal  multiengine  procedures 
and  techniques.  The  participant  will  develop  the  abilities 
required  to  execute  safe  take-offs  and  landings  under  all  normal 
conditions.  Standard  coordination  and  planning  maneuvers  will 
be  demonstrated  and  practiced  to  develop  pilot  f euniliarity  with 
the  performance  and  flight  control  responses  in  the  General 
Aviation  Cockpit  Simulator.  Standard  attitude  instrument  flight 
training  maneuvers  will  be  performed  to  develop  accuracy  and 
control. 


LESSON  CONTENTS: 

1.  Preflight  discussion 

2.  Cockpit  familiarisation 

3.  Normal  take-off 

4.  Aircraft  familiarization  maneuvers 

A.  Straight  and  level  cruise 

B.  Climbs,  climbing  turns,  and  level  offs 

C.  Descents,  descending  turns,  and  level  offs 

D.  Establishing  cruise  and  cruise  operations 

E.  Landing  gear  and*  flap  effect  on  aircraft 

F.  Slow  flight 

*G.  Stall  recognition  and  recovery  techniques 

1.  Take-off  configuration 

2.  Clean  configuration 

3.  Landing  configuration 

H.  Steep  turns,  45  degree  bank»  and  360  degree  turns  left 
and  eight 

*  At  least  one  of  the  following  maneuvers  will  be  at  a 
bank  angle  of  between  IS  to  30  degrees. 

5.  Landing 

6.  Fostf light  discussion 


COMPLETION  STANDARDS; 

The  participant  shall  be  faimilar  with  the  airplane  systems, 
limitations,  performance,  and  normal  operating  procedures.  The 
pilot  should  perform  all  standard  coordination  maneuvers  without 
deflecting  the  ball  in  the  ball-bank  indicator,  outside  the  center 
teference'line.  Turna  to  be  within  10  degrees  of  assigned  heading, 
altitude  within  100  feet  of  assigned  altitude,  and  airspeed  within 
10  knots  of  assigned  airspeed.  Stall  recovery  performance  will  be 
evaluated  on  the  basis  of  prompt  recognition  and  smooth,  positive 
recovery  action  with  a  minimum  loss  of  altitude  consistent  with 
the  recovery  of  full  control  effectiveness.  After  recovery,  the 
pilot  will  make  an  expeditious  return  to  the  original  altitude. 
Take-offs  and  la[;dings  will  be  evaluated  on  the  basis  of  technique, 
judgment,  speeds  per  aircraft  flight  manual,  coordir  tion,  and 
smoothness. 


At  the  completion  of  this  lesson,  the  participant  will  demonstrate 
attitude  Instrument  flight  under  normal  conditions  while  maintaining 
altitude  within  100  feet  and  heading  within  10  degrees  during 
straight  and  level  flight.  Turns  will  be  performed  maintaining 
altitude  within  100  feet  and  roll-outs  to  predetermined  headings 
within  10  degrees.  Climbs  and  descents  will  be  performed  within 
10  knots  of  the  desired  airspeed  and  level  offs  will  be  completed 
within  100  feet  of  the  assigned  altitude. 


A-3 


APPENDIX  B 


TRAINING  BRIEFING  AND  TRAINING  PROGRAM 

TRAINING  BRIEFING 

This  will  b«  a  tiainlng  flight  in  preparation  for  a  flight  in 
which  data  will  be  collected.  We  will  be  looking  at  your  profes¬ 
sional  approach  to  this  flight.  We  will  go  through  a  cockpit 
checkout  using  the  simulator  checklist.  We  will  take  off  after 
receiving  a  brief  air  traffic  control  (ATC)  clearance  and  climb 
to  altitude  where  we  will  do  some  airworK  starting  with  some 
ISO^  turns  at  various  bank  angles,  i.e.,  20^,  30*^,  and  43^  banks 
for  360°s  of  turn.  We  will  then  do  a  stall  series,  beginning  with 
power  off  clean  configuration,  then  a  climbing  turn  stall  (with 
climb  power  set  and  standard  rate  turns)  also  45*^  bank,  then  go 
to  the  dirty  or  landing  configuration  and  repeat  the  stall  series. 
When  completing  this,  we  will  maintain  an  assigned  altitude  and 
go  directly  to  SIE  VOR  ahd  hold.  We  will  hold  on  the  090^^  radial 
with  standard  turns.  We  will  then  get  vectors  for  a  VOR  approach 
to  runway  at  Atlantic  City.  We  will  make  a  missed  approach  off 
of  runway  4  then  will  receive  a  vector  for  an  ILS  approach  (o 
runway  13  to  a  full  stop. 

Points  that  the  project  people  will  be  grading  during  your  flight 


will  be: 

i; 

Assigned  altitude  ±100 

feet 

2. 

Heading  on  take  off  ±2^ 

*  df  runway  heading 

3. 

Pitch  altitude  on  take 

off  (10°  nose  up) 

4. 

Airspeed  ±5  knots  (175 

cruise) 

S. 

Standard  Rate  Turns 

6. 

Initial  Approach  Speed 

(140  knots) 

7. 

final  Approach  Speed  (115  knots) 

B-1 


TPAINIHG  PBOGRAM 


Each  participant  is  given  training  flights  before  collecting 
data.  There  are  two  levels  of  pilots:  <1)  Masters  and  (2) 
Journeyman.  The  Masters  group  will  receive  one  training  flight 
and  the  Journeyam  will  recieve  three  training  flights  of  1  hour 
each. 

First  Lesson 

1.  CocKpit  Familiarization  (Explanation  of  all  radio  and  instru¬ 
ment  equipment  except  flight  director  and  auto  pilot.) 

2.  T.  O.  Proc. 

3.  Series  of  Han. 

A.  Stt.-hvl. 

B.  Turns  at  diff  bank  angle— 10®  -  20®  -  30®  -  40® 

C.  Stalls — clean  and  dirty 

0.  Speed  changes  (pure  setting) 

£.  Series  of  Log  and  T.O.  with  missed  approaches 

Second  Lesson 

Simple  AaC  clearance  Vin  V-44  Leah  V-ie6  SIE,  hold  at  SIE 
vectors  for  VOR  approach  at  Atlantic  City.  Missed  approach 
vectors  ILS. 

Third  T.asson 

Review  of  Lesson  1  and  approaches  at  Atlantic  City  to  complete 
the  hour . 

The  objective  is  to  fly  the  simulator  as  a  real  aircraft  using 
all  the  normal  procedures  for  IFR  flight  and  for  our  project 
purposes  we  must  fly  as  close  as  possible  to  the  parameters  given. 


Initial  T.O.  roll  runway  heading  -2 

VMC  ■ 

80 

knots 

degrees  at  95  knots  pitch  up  to  lo'^ 

VR  ■ 

95 

knots 

gear  up,  flaps  up  maintain  125  is 

WSE  - 

111 

knots 

knots. 

Power  Settings 
1.0.  Power 

2275  RPM  39.5"Hg  MAP 
Climb  Power 
1900  RPM  3S"Hg  MAP 
Cruise  Power 

1900  RPM  32''Hg  «  17Sknots  IAS 

Initisl  epproach  140  knots  IAS,  1900  RPM,  approximately  22>23'' 

Rg  manifold  approach  (final)  115  IAS,  1900  RPM,  MAP  as  required. 


B-3 


APPENDIX  C 


LIST  OF  GAT  VARIABLES 


ITEM 

NAME 

SOURCE 

UNITS 

1  -  ■  - 

count 

530 

... 

2 

I  TIME 

530 

1  count/sec 

3 

SEGMfrjT  NUMBER 

530 

4 

N~P~0'SI  T  i'ON 

gat/nssp  T 

“usa'DA* 

s 

e-pos IT  ION 

NS5P  2 

LS8»6A  ■ 

6 

Z-POSITION 

NSSP  3 

LSB*1  6  ' 

7 

pitch  AMSLE  (THETA) 

NSSP 

.0055  degrees 

3 

ROLL  ANGLE 

NSSP  5 

.0055  DEGREES 

heaoinS 

M  SSP  i 

.1  DEGl^EES 

~  10 

'  rND  1  CTTSO^A  1 R  SATs  0  IT  AS  ) 

'NSSP'? 

.'18  79' ‘knot  s 

1 1 

TRUE  AIRSPEED  (TAS) 

NSSP  8 

.1879  knots 

12 

RATE  Of  climb 

NSSP  9 

FT/MIN 

13 

angle  Of  ATTAC<  (ALPHA) 

NSSP  10 

.0093  deg 

1A 

sideslip  angle  C3ETAP 

NSSP  11 

.0146  PEG 

rs  fEn'SHT~PTkTH  AN6L6  iSkmT)  CAuiUL/TTO  BTS'l^Trs" 


16 

WIND  angle 

GAT 

DEGREES 

■  17 

WlNlT  MAGNfrUDe 

iTST 

KNOTS 

rr 

19 

PITCH'KATE  ' 

ROLL  RATE 

NSSP  73 

•  U1  46 

,0293 

■DEGTsrr 
deg/ sec 

■  "20  ■ 

V  A  W  «  A  1"  E 

sysp^nr 

TcrTTy 

P bb /  b(: C 

71  smnr  oTTrEcrroTr 

22  wheel  OEfLECTION 

"GAT 

GAT 

^3^ 

PEDAC  OTFTTECTrdrr 

GAT 

25 

■■  N'AV^T^nrej'uG^trT  “  ■ 

NAV  2  FREOUENCY 

KiTS^Tr 

NSSP  14 

COOTtn^PE) 

CODED  (P£) 

26 

AOF  1  FREOUtNCY 

NSSP  15 

CODED 

(PE) 

c  r 

ni/r  w  FnfcWUvi'KvT 

23 

29 

yPNDH  CODE  OEATON)' 

XPNOR  MOOES 

.'rSSP"T7" 
NSSP  15 

CUDkP 

CODED 

CPE ) 

(PE) 

30  COMM,  1  FREOUENCr  NSSP  19  COOED  (PE) 

~“yi  ■  COMM.'TTREOUENCY .  NSSP^rU - COOEITTPr) - 


37“  DA'ST  xPN'oh  CO'tTE  '  “"'iS'S'P'  7T  '  ' 

33  R.Ml  1/3TW  NS5P  22  .1  DEGREES 


—34-- 

orn — 

' 

- Nssp-2y - 

M  "DEG-R EETTEPr) 

35 

CDI  1  (ANGLE) 

NSSP  24 

,1  DEGREES 

•$6 

COI  1  CCiNEAR) 

NSSP  S 

:  r 

C-1 


pS.Jl?M!?A??r  ul5! 


i^j-S  PyS?5S!l  QT'  'T*‘;X”i“T'?Cr?  F5  ^p.»r»~-,v<i',-'-; 

s  3f3  T!\'  ■;  r''*'3.g"5?|^;  V^y  vv‘?yu;  CGi'j-'-.ii^vTS.ATE  C-u 

Tm  f^iim  85ss  im  ^M^yus  mr  try  ?i?  Aijn-j-ma 

!?7TT  Wf^  '-&!  GAT  Sg5 

’■>,■..4“  .-A  ;?;?!" "A  'Tf'M  A'2T“‘''!'<3  /'-'^  "”'■"  ■>?»’..>  ■■•?. 

A  ■  Trt  t:*  vayn  n-rT-'T  ?a  pini;-.. 

T?®  m  AsciTwr;]  a'^  rs?;  -at, 


T^Js  "■'-  '1/ fll.  ?  rr*["'5i’-,'>f^,-f7^  I 

OTitMr^BZl  7t7Zf) 


vni  a)  ^  '*5) 
”2  Im:--  rm 

f.fTS  ■*  ^  *- 


*-•,  •>  .3  3»''  '■  ;.  ;  f). 


..*•.■«/-.  -,.  v-.-->  Y.-3  45Y  W-.Tf-r-i  ,.  /  «  V  ^  i' 


'tA.jf  V'T' 


t/'.'  •-''''?^' 


ijy 

:‘*-?^3  ^’i^ 

'••-■-  'ix'.l*^’ 


. 


SEGMENT-CENROUTE  LEVEL)  NO.  5  AVALO  TO  SIE 


E-1  Pilot  maintains  course  alisnment  minimum  cdti, 


COI  LARGE  1 

2 

3 

4 

S 

6 

7 

8 

CDI  SHALL 

10* 

0“ 

E-2 

Pilot  maintains 

ASSIGNED- 

ALTITUDE 

STRONGLY 

8 

STRONGLY 

DISAGREE  ^ 

2 

3 

4 

5 

5 

7 

AGREE 

E-3 

Pilot  maintains 

SMOOTH 

FITCH 

AND  BANK 

CORRECTIONS. 

STRONGLY 

2 

8 

STRONGLY 

DISAGREE  ^ 

3 

4 

5 

6 

7 

AGREE 

£-4 

Pilot  maintains 

POSITIVE 

CONTROL. 

SELDOM  1 

2 

3 

4 

5 

5 

7 

8 

ALWAYS 

Cl  roMi  OT  u..  I«im  1M1 


D-5 


SEGMENT-IUM  NO. 


6  SIE 


SE6HEN7-'(ENR0UTE  LEVEL)  NO.  7SIE  TO  BRIEF 


E-1  Pilot  maintaIiNs  course  alignment  hini.mum  cdti, 


COI  LARGE  1 

2  3  4  5  6  7  8 

CDl  shall 

/O* 

o“ 

E-2 

Pilot  maintains 

ASSIGNED  ALTITUDE 

strongly 
disagree  ^ 

2  3  4  5  6  7  8 

STRONGLY 

AGREE 

E-3 

Pilot  maintains 

smooth  PITCH  AND  BANS 

CORRECTIONS. 

strongly 
disagree  ^ 

2  3  4  5  6  7  8 

STRONGLY 

AGREE 

£-4 

Pilot  maintains 

positive  control. 

SELDOM  1 

2  3  4  5  6  7  8 

always 

CT  fOM  •aOO.I*.!  OT  Ui«  1141 


D-7 


SEGMENT-MU  NO. 


8  BRIEF 


SE6MENT-( DESCENT  NO. 


9  BRIEF  TO  VCN 


D-1  PlUOT  MAINTAINS  SMOOTH  RATE  OF  DESCENT, 


STRONGLY 

DISAGREE 


1  2  3  4  5  6  7  8 


STRONGLY 

AGREE 


D-2  Pilot  maintains  bank  angle  at  zero  or.  if  required  to 
TURN,  does  not  EXCEED  BANK  FOR  A  STANDARD  RATE  TURN. 


STRONGLY  .  ^  , 

DISAGREE  12345678 
D-3  Pilot  adjusts  power  for  descent. 

YES  (1)  NO  (0) 

D-4  Pilot  maintains  positive  control. 


strongly 

AGREE 


SEUKM 


1  2  3  4  5  6  7  8 


AtWAIfS 


CT  •3W-1Q.I  Ill-Ill  OT  U»  II-M 


D”9 


SEGMENT-IilM  NO.  ''CN 

T-1  Pilot  iniatiates  turns  at  correct  point  in  the  flight 
plan. 


YES  (1)  NO  (0) 


T-2  Bank  roll-in  and  roll-out  are  smooth. 


VERY 

ROUGH 


12  3  4  5  6  7  8 


VERY 

SMOOTH 


^“3  s.  STANDARD  RATE  TURN  IS  MADE. 


STRONGLY 

DISAGREE 


1  2  3  4  5  6  7  8 


STRONGLY 

AGREE 


T-4  Pilot  maintains  altitude  during  the  turn 


STRONGLY 

DISAGREE 


1  2  3  4  5  6  7  8 


STRONGLY 

AGREE 


T-5  If  you  disagreed  in  question  T-4,  did  the  pilot  make  a 

CORRECTION  IMMEDIATELY  TO  THE  ASSIGNED  ALTITUDE? 


YES  (1)  NO  (0) 


T-7  Pilot  rol^S  out  on  correct  course/head i ng .  Circle  number 

CLOSEST  to  error  AT  ROLL -OUT. 


ERROR 

HIGH 


12545678 


ERROR 

LOW 


D-3  0 


SEGMENT-CENROUTE  LEVEL)  NO.  VCN  TO  JIMM2 

£-■'  Pilot  MAifiTAiNS  course  alignment  minimum  csti, 


cDi  URGE  12345678  cdi  small 

/o* 

£'2  Pilot  maintains  assigned  altitude 


STRONGLY 

DISAGREE 


1  2  3  4  5  6  7  8 


STRONGLY 

AGREE 


E-3  Pilot  maintains  smooth  pitch  and  bank  corrections. 


STRONGLY 

DISAGREE 


1  2  3  4  5  6  7  8 


£-4  Pilot  maintains  positive  control. 


STRONGLY 

AGREE 


SELDOM  12345678 


CT  (11-411  OTU.«  fwlM  11-41 


D-11 


Sc6MENT-ILlM  NC.  .  JIW2 

T-1  Pilot  iniatiates  turns  at  correct  point  in  the  flight 
Plan. 


YES  (1)  NO  (0) 


T-2  Bank  roll-in  a.yd  roll-out  are  si-iooth. 


VERY 

ROUGH 


12345678 


T-3  X  standam  rate  turn  is  hade. 


strongly 

disagree 


1  2  3  4  5  6  7  8 


VERY 

smooth 


strongly 

AGREE 


T-4  Pilot  maintains  altitude  during  the  turn 


strongly 

DISAGREE 


1  2  3  4  5  6  7  8 


strongly 

agree 


T-5  If  you  disagreed  in  question  T-4,  did  the  pilot  make  a 

CORRECTION  IMMEDIATELY  TO  THE  ASSIGNED  ALTITUDE? 

YES  (1)  NO  (0) 


T-7  Pilot  rolS  out  on  correct  course/heading.  Circle  number 

CLOSEST  TO  ERROR  AT  ROLL-OUT. 


IRXOR 

HIGH 


12J45578 


ERROR 

LOW 


D-12 


SEGMENT-CFINAL  APPROACH)  NO.  13  J1W2  TO  ACY 

F-1  Pilot  intercepts  and  correc-p.y  turns  oh  to  final  approach 
COURSE. 

YES  (1)  NO  (0) 


F>2  ^ILOT  MAINTAINS  SMOOTH  RATE  OF  DESCENT. 


STRONGLY 

DISAGREE 


1  2  3  4  5  fi  7  S 


STRONGLY 

AGREE 


F-3  Pilot  establishes  appropriate  approach  airspeed 

170  Cli  (2)  0}  (^)  a4ivl*tioJi  in  *irsr«t<l 

t  IflM  23  13  10  3 


F-4  Pilot  maintains  proper  altitude  to  glideslope  intercept, 


STRONGLY 

DISAGREE 


1234S678 


STRONGLY 

AGREE 


P-5  Pilot  establishes  and  maintains  appropriate  glideslope 
alignment  (VDI). 


STRONGLY 
PIS AGREE 


12345678 


strongly 

AGREE 


c  c  crTsoi  '«uee  tun  ma  imTA  INS  Loc«li*«r  ALIGNMENT  (CDI). 

I  — y  I  j  I  V  ....  - 


roLi. 

SCALE 

DEVIATION 


1  2  5  4  5  5  7  8 


ONE  needle 
,^.OTH _ ^ 

deviation 


F-7  Pilot  hakes  a  s.mooth  landing. 


strongly 

DISAGREE 


12  3  4 


CT  potm  isoe^io.i  (iMi)  or  Uf«  114; 


5  6  7  8 


strongly 

agree 


D-13 


Al'I.iNUI*;  l: 


rAvnoicAiij  b'lJrJ  Hi'j 


I  Miui.  NkHiF<  run  yuu  yms  ncAioHi  wHf  aae  t/biM6  tMk  AkiCAhCM 
Afl^  VbUX  HbLl  AS  A  PAnilCIAAKI,  YH£  n6A|UKtH£llt  Ut  PiLbT  fCnrOftMAMCI 

MAI  IStrt  Ab^bMPLIlHtti  RAtMSK  HAPMAlAAbLY  THAbU^MOUt  YmK  MItlOAY  Of 
AVIAIlUh.  Ml  lylkAANAtlLV  nteO  ItCHNIbgIS  YO  tVALUATl  YHt  IMPACT  UT 
CbCAPn  CIlAflbll  UH  till  HIMAVIOK  Of  P|LOU.  1M|  PgPPgSC  Of  THIS  STUOy 

II  TO  Tnr  OJI  *gM£  MlAIUPtMIflt  lOEAS  Thai  «  OEVELOPEO  which  hay 
lAIHtf  g»  tCOlEP  10  O'JM  dOAL.  Till  IPAImJ  PiLOl  HAS  fAMiLlAPIZtO 
YOU  MiiH  ;oiir loufAi loH  or  tug  oai,  a  siMULAiioti  or  The  ctmiA  ii21. 
HIS  POPPOIE  WAS  riot  To  TEACH  YOU  HOw  TO  fCr.  SOI  HAlHEA  TO  iriSUHt 
Thai  you  amcw  hheae  tvEAriHirio  mas  aiiu  r.riEw  how  to  orEPAig  acl 
The  iooipmeni,  You  have  sirii  SEctcUO  se'.ause  you  have  a  spgcific 
AHOUMI  WT  lAltPIEflCI  lIlMtH  AS  A  MIOn  OR  PECAHVELY  LOW  TIME 

All  01.  This  .s  paii'  or  The  HlkEAHCM  oniiS'i  Ano  l  oah  rioi  ixplaih 
IT  roAiHiP  wiiilL  The  irio  of  The  lAPEHiMErn .  Aiir'  ogiEiir^ris  you 
have  mill  11  AHIWgACO  A!  MAI  t|M|,TH|  PILOT  l>t  THE  njOHl  SEAl 
or  The  AlPChAfl  MILL  IE  OOlTPLIItirrO  A  PEHfOMlAAHCE  EVAL'JAIIOM 
fOPH  OOAItiO  tALH  rilOHI  AHO  ||  HOT  ALLOwEO  TO  AHSMEP  AHY  OUElTIONI 
OP  PPOVlOE  PEtOSACf.i  A!  THE  COHrLtllON  Of  THE  lELO'tO  PLIOHl  HE 
fTAY  THIN  ANSWEP  YOUP  OyilHONI.  YOU  WILL  ALIO  NOTE  MAI  WE  APE 

tapino  The  iniipumeni  panel  oopino  tALH  Test  rLiOMi,  This  is 

POP  POST  ILIOHI  EVALOAiivN, 

YOUP  NAME  MILL  Mill  APPtAP  ON  AN'  Of  OP  fOPMS.  YOU  HAVE  |EEN 
AkllONIO  AN  AHIIThANY  NOMItP,  APIEP  ME  COlLCLl  ME  OATA,  ALL 

Ntrr,PEN(.r  to  you  as  an  inoiviouai  mill  it  oeleho.  me  api  not 

EYALUAIINtt  YOU!  PAMIP/YOU  APE  HElPIHS  UI  IVALUATE  OuP 
HEAkuntMENI  lYllEH!  VUU  APE  MEPI  AS  A  VOLUNIEIP  ANO  HE  PIALLY 
APPPICIATE  This,  you  hay  TIPHINATI  Y(JUP  papiicipai ion  at  any 
1|M|,  HOwEYtP  |r  YOU  00  ALL  The  BPPOPI  ME  HAVE  PU'  IN  |U  PAP 
MILL  HAVE  ItkH  MAI  1 10. 


r.  i 


WE  ENCOURAGE  YOU  TO  DO  THE  BEST  YOU  CAN  DURING  THIS  STUDY  AND 
WE  HOPE  YOU  WILL  TAKE  SOMETH. NG  POSITIVE  OUT  OF  IT  FOR  YOURSELF. 
YOU  WILL  BE  ASKED  TO  PROVIDE  US  WITH  ONGOING  INFORMATION 
CONCERNING  YOUR  WORKLOAD  DURING  EACH  TEST  FLIGHT,  PLEASE  BE 
AS  OPEN  AND  ACCURATE  AS  YOU  CAN. 

THANK  YOU  AGAIN  FOR  YOUR  HELP.  THE  PROJECT  PILOT  WILL  BRIEF 
YOU  ON  YOUR  FLIGHT. 


I 

i 


E-2 


I 


APPENDIX  F 


WORKLOAD  SCALE  INSTRUCTIONS 


ONE  PURPOSE  OF  THIS  RESEARCH  IS  TO  OBTAIN  AN  HONEST  EVALUATION  OF 
PIuOT  WORKLOAD  OR  HOW  HARD  THE  PILOT  IS  WORKING.  BY  WORKLOAD, WE 
MEAN  ALL  THE  PHYSICAL  AND  MENTAL  EFFORT  THAT  YOU  MUST  EXERT  IN 
ORDER  TO  FLY  THIS  AIRCRAFT.  THIS  INCLUDES  PLANNING,  THINKING, 

NAVI  GAT  I ON, COMMON  I CAT  I  ON,  AND  CONTROLLING  THE  AIRCRAFT. 

THE  WAY  YOU  WILL  TELL  US  HOW  HARD  YOU  ARE  WORKING  IS  BY  PUSHING 
THE  BUTTONS  NUMBERED  FROM  1  TO  10  ON  THE  BOX  MOUNTED  BELOW  THE 
THROTTLES.  I  WILL  REVIEW  FOR  YOU  WHAT  THESE  BUTTONS  MEAN  IN  TERMS 
OF  WORKLOAD.  AT  THE  LOW  END  OF  THL  SCALE ;10r2  YOUR  WORKLOAD  IS 
LOW-YOU  CAN  ACCOMPLISH  EVERYTHING  EASILY,  AS  THE  NUMBERS  INCREASE 
YOUR  WORKLOAD  IS  GETTING  HIGHER,  NUMBERS  3  AND  5  REPRESENT 

t 

INCREASING  LEVELS  OF  MODERATE  WORKLOAD  WHERE  THE  CHANCE  OF  ERROR 
IS  STILL  LOW  BUT  STEADILY  INCREASING.  NUMBERS  6,7  AND  8  REFLECT 
RELATIVILY  HIGH  WORKLOAD  WHERE  THERE  SOME  CHANCE  OF  MAKING  MIS¬ 
TAKES  ,  AT  THE  HIGH  END  OF  THE  SCALE  ARE  NUMBERS  9  AND  10,  WHICH 
REPRESENT  A  VERY  HIGH  WORKLOAD,  WHERE  IT  IS  LIKELY  THAT  YOU  WILL 
HAVE  TO  LEAVE  SOME  TASKS  INCOMPLETED. 


ALL  PILOTS,  NO  MATTER  HOW  PROFICIENT  AND  EXPERIENCED.  CAN  BE 
EXPOSED  TO  ANY  AND  ALL  LEVELS  OF  WORKLOAD.  IT  DOES  NOT  DETRACT 


HE(SHE)  is  working  HARD  OR  HARDLY  WORKING.  FEEL  FREE  TO  USE 
THE  ENTIRE  SCALE  AND  TELL  US  HONESTLY  HOW  HARD  YOU  ARE  WORKING! 


YOU  WILL  HEAR  A  TONE  AND  THE  LIGHT  ON  THE  BOX  WILL  COME  ON.  PUSH 
THE  BUTTON  OF  YOUR  CHOICE  AS  SOON  AS  POSSIBLE  AFTER  YOU  HEAR  THE 
TONE,  THEN  THE  RED  LIGHT  WILL  GO  OUT.  REMEMBER  THAT  THIS  DATA 
IS  NOT  BEING  COLLECTED  BY  NAME.  AND  YOUR  PRIVACY  IS  PROTECTED. 


F-1 


I 


APPENDIX  G 


TEST  FLIGHT  BRIEFING 


You  have  been  briefed  by  the  psychologist  as  to  the  objectives 
of  these  testa. 

For  this  data  collection  flight,  assume  that  you  are  talcing  a 
round  robin  instrument  flight  and  I  am  the  FAA  examiner  giving 
you  your  annual  instrument  check. 

Assume  that  you  are  along  in  the  aircraft  so  you  will  be  required 
to  perform  as  both  pilot  and  co-pilot.  Atlantic  City  ground  control 
will  give  you  an  IFR  clearance  which  you  will  be  required  to  read 
back. 

Perform  a  normal  takeoff  rotating  to  10°of  pitch  at  approximately 
100  knots  IAS.  Your  performance  will  be  evaluated  on  your  ability 
to  maintain  runway  heading  and  aircraft  pitch  within  and  wings 
level,  while  accelerating  to  the  desired  climb  airspeed  of 
125  knots  IAS. 

After  gear  and  flaps  have  been  retracted,  reduce  to  climb  power 
settings  and  maintain  125  knots  IAS.  During  the  climb  phase, 
your  performance  parameters  will  be  Is  on  both  heading  and  air¬ 
speed  with  a  smooth  rate. of  climb  and  bank  during  any  turns. 

After  reaching  assigned  altitude,  reduce  to  cruise  settings  so  as 
to  maintain  175-  knots  IAS.  During  this  en  route  portion  of  your 
flight,  your  performance  will  be  graded  on  your  ability  to  main¬ 
tain  altitude  within  tiOO  feet  and  airspeed  within  tS  knots  IAS. 

You  will  also  be  expected  to  keep  the  CDI  within  one  dot  on 
either  side  of  centerline  of  the  airway. 


G-1 


During  daacent  to  initial  approach  altitude)  retard  power  to 
maintain  17S  knots  IAS.  You  will  again  be  graded  on  your  ability 
to  maintain  a  smooth  rate  ot  descent  with  minimum  bank  and  pitch 
corrections  while  maintaining  correct  course  alignment. 

Final  approach  will  be  ilown  at  115  knots  IAS  which  you  will  be 
expected  to  keep  within  -3  to  eS  knots  IAS.  Gear  should  be 
extended  at  glide  slope  intercept  and  the  degree  o£  flaps  at 
which  you  are  most  comfortable  will  be  acceptable.  The  grading 
parameters  for  this  portion  of  the  flight  will  be  as  previously 
stated  on  airspeed  (-3/-»-5}  with  smooth  minimal  pitch  and  bank 
corrections  to  maintain  localizer  and  glide  slope  centerline. 


G-2 


appendix  h 


flight  geometry 


\ 

\ 

\ 

\ 

\ 

\ 

\ 

5«V  / 

33  •f'  fK(M>t/\L.  \  ^ l)^ 

\  /‘^ 

SIE  U14*8> 


A.PFSS3IS  I 


AIR  TIAFFIC  COmOL  SCRIPT 


SiS"  Afir^rt  ^.T.fcrsasica  eaho,  Aet-”F,£is  Cj.?.;'  MaashsT  aaasyrad 
■aiss  'y.'orss?';,  asa  jsilc  in  VEiri,  tes^x!SK\”;n  f.'T.'S 

ala,  daw  pis?  2o's^  siSisa^s,  eXtiiM'::’?  iw  silnox  ti^Si 

lasdiag  crwS  daaartiag  oo«  th?«s,  anp-’isE  v?-,£c?s  -t©  S5js  7.1S 

rwRifl^r  'IS*  Jlsa*  eooaaa  <»?cr£i55!^  «:’,r~  'jraSt  aeasaix 

fTC«s®S  CMStC'X  friea  to  flt-tffStet,  tajiaea,  ssfa-it*?  «  initial,  ctttfflct 
tijffiS  jet!  tiara  reaai'ftd  i'^Straatica  «e'ao, 

CAT  AcSssitis  Tor'iiy’jna  ant  thra*  ana  :?T. 

■ATC  CiM  *"0  aijht  kii"  ciesrane*  ca  rtqufits. 

ATC  Ti’A  fty?  tint^-twa  tsni  tt  rv-aaar  oo«  Ehtaa. 


CAT  Go  <5jj«cd, 

/iiC  Caa  !i:s'".a  si^e  CiO*rfi'4  to  tha  Csdar  lia’ts  vnfi-.’a  ?.s  Jiioi, 

loslttni"  trc  si^arat  J’.i’:{Scv  eiaircistf.  tf  ttraa  5^ou*aan  .tt 

S'tffi  Xait).  Aitt®?  itaf.Trt.*?’?? j  sjclattia  rimvtw  ?ot  to  Join 

tlto  ait7  tta  ^avta.  att  Ttd'.Aj.  otatt.Kt  thtaa  ana, 

41v‘?'’'7»ivT''  flH*r,s^T^5.  cs'**  '-''.•“■'v  •‘■'‘'■^'pfi  5vvc{^ 

CAT  (ilay  ;>««'/.  eXaw.jjsaa.) 

\Ch®c5?i  r* 

.r.a^ , ) 

A.TS  Osa  aijiit  t'iie  tssi  art  eStaa,  t«.'<st  nica  i-^tn  rojs'.n, 

(Salciss  Si^T  tiaa  atatsd  cha?  ka'?e  r.?>:C  ".'Xl*.) 

CAT  One  ei'Jit  Hilo"  ^•;or  pt'?  v?  Si-tra  the  itiJ;o. 

ATS  vis  Eha  asswa?  Is  nasj.ssiro,  i'.rwa  1X11?  ia’a~"*,"'.'.a''.  fnar*  r'msa, 

oehar^Hra,  r«  aeot;-  ir  ssMSoaetry.? 

'7J  AtlaaSic  "ity  Sra-’ri<i  ”’?'’T~X'{!r  sawmj  s^rr-n".  Ja-iia',,  T.Th  so  Tar.»iaatjpra. 
ATG  fiper  aarm  leroa  j'rvj.i-'t  a'it<','-cr'.aa  os  r««a,<ja"r;, 

Gffil  AiSXestia  Cit'^  Te^cf,  an?  ai.'Jjt  itilo  it  TSOSy, 

ATG  Oao  fiirht  Hiis,  irajy.s?  an,a  tfcap.s  tX.ssa»^  Sc?  vAhssIS. 

SA.ST  Aslsstia  Sit”  Tn^tt,  t.",?';"’:?  ;'a  with  a-r?  r.t  the  Tssrh«:?. 

ATC  Snsts?”  ■'•'-•'■?’;?>  rcr.'-rt”  roe  *tv?.?3  ajs-trad  t"  ',*?  '., 


1-1 


OAT  ASleiitif^  'i?.-  ^s^?;ftv.T?«j  ■^«  sigfcj  ia  •’H.ih  fsv.. 

ATC  0»a  si?*;  siiXOs  yatJar  ca^taet,  aad  con.sf.u-'^j  ymse  cli^h  s?  SSs7j>3 
vhousae^i.  (Ailitaif*  ahaasc^sau^a  piles  eete^aa  it.) 

CAT  Sosar,  ws'm  laeifteg  ?ets  Sot  tlSTee  t^iOBsaa-!. 

ASC  (tihao  CAT  leaves  1,190  feat)  Oae  eight  kilo  tvt-n  right  hsai-Uog  ttfe  rsro 
sere  ea  intereept  th®  Aelatoie  City  eae  sevens  caa  radial  oa  course  osd 
es«Ji™  y«3?  tetvi«5  jck. 

CAT  Preeead. 

ATC  1  f«a-An  eea  eissy-four.  Sever  eishteea  niag,  o®<a  ys. 

ATC  2  teven  five  alpha,  traffic  11  e'clec):,  2  riles,  so-ithhouad,  tt  5. 

73A  Ee  Jey—jm'r®  ia  She  ao«a. 

ATC  3  TSitiW  faar  t*rg'’i  *  *!>*  ffiarker,  tare,  yij^s  hsaiicc  oae  sore  sero. 

eleari^  Sc?  tls®  IM,  tasssr  ei^tosa  sips  ft  the  tsr.rher. 

ATS  *  Sha  eijit  kilo,  traffic  J.C  e'slosS,  A  ri’ss.  r;''r:''ssitt  hr'Tti,  altitvifin 

itr’tjsrja. 

CAT  l&'r*  :?!1. 

CC  Atletssie  City  copro-sth.  Coast  Swerd  nis  or*  three  Crs  ss-'ct  srlsh  rev.. 

61327  ■ 

ATC  S  'Coeot  C«ffrd  s;is  <ws  three  tsio  rtrsn,  is'ent.  ‘tZmsit  Oity  s'.Sir.tS'i"  tvo 
oisser  riser  ecro, 

ATC  6  ’v-3v«i?»«r  three  fstje  sir,  trte  slphe,  5-5r.Jwh  c-^e  ets  St;?  tai  if’sss. 

ATC  7  Aa»!ri-t.n  fear  fifty^air.  cell  Cestsr  rr.  tat  st-;  sere  pc-'^R  t-.jo  c-ss  yn. 

ATS  S  Cse  eight  hilo,  yea'ro  slsar  ef  that  orovier-s  'rafCit. 

ATS  5)is  t'es  alicisa  rede?  emtess  prseaed  'direct  Ttetrs,  s-.iifs  te  S. 

AT5  ?  Hovrshsr  ere  tve  tic  Sira  ct\«,  trtCfis  IJ  o'tier.'r,  '  siles,  sat!S''’sorad, 
¥ry*?Trf f f*^%c 

F-^s<5!»5  Atts'atii  by 

otsC  fivr  r«'5h  'i'-.*  e>T’r;c-pr?!i5*ti'?'a  “HtS  g5  revr 


X-2 


ATS  Saatora  tr&inar  sen,  idras,  Alleasis  Sisj  Z-^  niz'z 

eitht  ?iv«,  ijcsSils  prtcsica  apjroach-aa . 

EM61Q  OEt,  tra'll  seh«  43  approach  to  t  fall  atop. 

It 

ASC  Sojtr,  depart  C«d«r  Lake  hotdios  oco  ters  tero,  VEstsre  liS  ran^fav  eaa 
three  final  approach  coaxta,  neintain  S. 

ATC  U  Sis  lia  bxara,  coatati  tfiCairo  approach  oat  tvo  ae’osn  point  3 in, 

ATC  15  One  nijht  kilo,  traffic  2  o'clock,  5  ailes,  wcotboand. 

IW  '  Coper,  we’xs  173. 

ATS  IS  Saatam  tTeine"  forty-sik  ten,  elnsxtd  for  the  XT.f  via  tka  Csisr  lake 
txsnoitien. 

elf-*?  of  she* 

EAAS19  SojriX,  skc'  vs  tat  of  5,  .tnd-vh  yv:  'tvX  vi  sc  osty  vitk  ;»cv;  or  po  cv<5T 
to  eh®  Zzmrt 

ATC  17  rarty-ais  tsn,  Sotfax  tjphtnop  nine  -f,  tkt  tifxktt,  vcg're  tta  ffst:  it 

sot;. 

KAtM2  Sofer. 

ATS  ’S  Ton?  tw  pipe  -•xishry,  S!;ivy:h  sero  earn  ''ivv. 

ATS  tv-i;  »•;  BTin"  ij  ccnnlBSa.) 

AK:  19  Cfso  citbt  kilo  l•'atac^■<l  tc.  TjOSO, 

If”  Soger. 

ATC  20  OttiG  ftick*  kilo  «lca“3d  T.V?  approatk  via  'vtk?  css*  S’.'t  Ctiar  lahn 

'TSO  v^T-'^ 

IC't 

ATS  21  Eor'5  'sfx'ct  n'""nE’r-tx ,  traffic  3  o'cletk,  -  filar-,  f'xkekrvpd. 


OSS  3es«x, 


ATC,  22 
IBT 

Cff! 


Oao  eight  kilo,  i?  thif  xp'.pn  to  k-r.  a 

Roger. 

Eero  a-i^ht  r'ivr*”bo7  ala-n  ~f  srar'fia. 
Eogicr. 


ATC  2A  All  Aircraft  deatined  for  the  Cape  Charlea-Horfolk  area,  monitor  VOR 
voice  for  algnet  concerning  aevere  turbulence. 

GAT  Touer  noveaber  one  eight  kilo  with  you  at  the  marker, 

ATC  One  eight  kilo  vind  calm,  altimter  two  niner  eight  five,  runway  one 

three  cleared  to  land. 

GAT  Roger . 

ATC  2S  Seven  two  alpha,  cleared  for  ioBediate  takeoff  or  taxi  clear  of  the 
runway,  traffic'#  on  a  2-milc  final. 

72A  Roger,  on  the  go. 

When  on  ground; 


One  eight  kilo  turn  right  at  the  next  available  taxiway,  ground  point 
nine  clearing. 


APPENDIX  J 


FLIGHT  WORKLOAD  QUESTIONNAIRE 


PARTICIPANT  CODE  DATE 


FLIOiT  WORKLOAD 
QUESTIONNAIRE 


instructions:  T>€  four-questions  which  follow  are  TCrBe-CCMPt£TH|-AT  THE  END  OF 
EACH  FLIGHT.  YOUR  RESPONSES  SHOULD  CONCERN  ONLY  THE  FLIGHT  YOU  HAVE  JUST 
CCMPLETED.  DISREGARD  ALL  OTHERS.  YOUR  NAME  IS  NOT  RECORDED  ON  THIS  FORM  AND 
WE  WOULD  APPRECIATE  IT  IF  YOU  WOULD  BE  AS  ACCURATE  AS  YOU  CAN.  YOUR  ANSWERS 
ARE  BEING  USED  FOR  RESEARCH  PURPOSES  ONLY. 


1.  CIRCLE  THE  NUMBER  BELCW  WHICH  BEST  DESCRIBES  HCW  HARD  YOU  WERE  WORKING 
DURING  THIS  FLIGHT. 


DESCRIPTION  OF  WORK  LOAD  CATEGORY 

RATING  (circle  CNe) 

WORKLOAD  LOW  -  ALL 

1 

TASKS  ACCOMPLISHED 

9 

QUICKLY 

3  . 

MODERATE  WORKLOAD 

4 

CHANCE  OF  ERROR  OR 

c 

OrWSSION  IS  LOW 

6 

RELATIVELY  HIGH  WORKIXAO 

7 

CHANCE  OF  ERROR  OR 

0 

OWISSION  RELATIVELY  HIGH 

* 

9 

VERY  HIGH  WORKLOAD 

10 

NOT  POSSIBLE  TO  PERFORM 

ALL  TASKS  PROPERLY 

XL 

17 

2.  V«AT  FRACTION  OF  THE  TIME  WERE  YOU  BUSY  DURING  T>C  FLIGHT? 


SELDOM  HAVE  123456789  10  FUUY  OCCUPIED 

MUCH  TD  DO  AT  ALL  TIMES 

3.  HOW  HARD  DID  YOU  HAVE  TO  THINK  DURUC  THIS  FLIGHT? 


ACTIVITY  IS 
COMPLETELY  AUTDMATTC 
MINIMAL  THINKING 
ANO  PULillNG 


1  2  3  4  5  6  7  8  9  n 


A  GREAT  DEAL  OF 
THINKING,  PLAItllNS 
AfCI  CONCENTRATION 
NAS  NECESSARY 


4.  HOW  DID  YOU  FEEL  DURING  THIS  FLIGHT? 


TTC  EXPERIENCE 
IS  RELAXING 


12345678  9  10 


THE  EXPERILNCE 
IS  VERY  STRESSFULL 


THANK  YOU  FOR  YOU  ACCURATE  ANSWERS. 

CT  fOKt  IMO-IOCIMI)  0TU„  114} 


J-1 


//a7-/ 


APPENDIX  K 


INTERRATER  RELIABILITY  CORRELATIONS  —  MASTERS 


INTERRATER  RELIABILITY  (OBSERVER  RATINGS)  CORRELATIONS 
MASTER  PILOTS 


Reviewer  Pairing 


Participant  Run 


03  1 

03  2 

04  1 

04  2 

06  1 

06  2 

07  1 

07  2 

08  1 

08  2 

09  1 

09  2 

10  1 

10  2 

22  1 

22  2 

23  1 

23  2 

24  1 

24  2 

25  1 

25  2 

31  1 

31  2 


All  Masters 

All  Participants 
On  All  Flights 


1.2 

1.3 

2.3 

.77 

.68 

.91 

.88 

.92 

.95 

.93 

.86 

.92 

.96 

.98 

.97 

.92 

.89 

.93 

.92 

.90 

.95 

.95 

.95 

.99 

.96 

.87 

.87 

.91 

.90 

.96 

.93 

.91 

.96 

.84 

.84 

.94 

.83 

.88 

.80 

.81 

.72 

.91 

.95 

.94 

.97 

.89 

.84 

.92 

.95 

.94 

.96 

.96 

.96 

.95 

.92 

.91 

.94 

.96 

.95 

.97 

,97 

.94 

.97 

.97 

.94 

.96 

.97 

,89 

.91 

.91 

,82 

.90 

.91 

,88 

.94 

.84 

.83 

.86 

K-1 


APPENDIX  L 


INTERRATER  RELIABILITY  CORRELATIONS  —  JOURNEYMEN 


INTERRATER  RELIABILITY  (OBSERVER  RATINGS)  CORRELATIONS 
JOURNEYMAN  PILOTS 


Reviewer  Pairing 


Pai C ic ipant 

Run 

1.2 

1.3 

2.3 

12 

1 

.86 

.62 

.65 

12 

2 

.90 

,89 

.82 

13 

1 

.52 

.74 

.24 

13 

2 

.79 

.76 

.81 

14 

1 

.73 

.58 

.68 

14 

2 

.76 

.61 

.80 

15 

1 

.74 

.78 

.62 

15 

2 

.81 

.78 

.86 

16 

1 

.80 

.73 

.79 

16 

2 

.94 

.88 

.93 

17 

1 

.78 

,79 

.88 

17 

2 

.81 

.77 

.80 

18 

1 

.81 

.84 

.77 

18 

2 

.82 

.82 

.90 

19 

1 

.63 

.74 

.71 

19 

2 

.86 

.77 

.87 

20 

1 

.54 

.68 

.56 

20 

2 

.89 

.76 

.87 

26 

1 

.94 

.92 

.93 

26 

2. 

.85 

.89 

.85 

27 

1 

.88 

.91 

.92 

27 

2 

.58 

.77 

.61 

28 

1 

.76 

.53 

.69 

28 

2 

.27 

.36 

.36 

All  Journeymen 

.77 

.76 

.76 

All  Par;,  icipantt; 

On  All  Flights 

.84 

.83 

.86 

L-1 


